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Abstract 

The usefulness of joint and conditional maximum-likelihood is considered for the Rasch 
model under realistic testing conditions in which the number of examinees is very large and 
the number is items is relatively large. Conditions for consistency and asymptotic normality 
are explored, effects of model error are investigated, measures of prediction are estimated, 
and generalized residuals are developed. 


Key words: Logarithmic penalty, entropy, consistency, normal approximation 



Acknowledgements 

The authors would like to thank Matthias von Davier, Paul Holland, Sandip Sinharay, and 
Hariharan Swaminathan for their helpful comments. 


11 



Introduction 

The Rasch model (Rasch, 1960) remains a commonly used model for analysis of item 
responses despite competition from more general two-parameter and three-parameter 
logistic (2PL and 3PL) models (Hamblcton et ah, 1991, chap. 2). The Rasch model has 
the attraction of relative simplicity, and parameter estimation is feasible without use of 
a parametric model for latent ability. Nonetheless, many rather basic problems affect its 
use in psychometrics. Significant problems arise in estimation of parameters, both in a 
computational sense and in terms of asymptotic theory, in realistic cases in which the 
number of items is relatively large and the number of examinees is very large. The effects 
of lack of fit need examination, as does the fundamental problem of assessment of the size 
of model error and of formally testing for lack of fit. Methods for residual analysis are also 
required that can be justified in terms of large-sample theory. 

This report examines these issues for the commonly used joint and conditional 
maximum likelihood approaches (Hambleton et al., 1991, chap. 3). To simplify matters, 
only binary responses are considered. In later reports, versions of the Rasch model for 
polytomous responses will be examined. To illustrate results, data from the October 2, 
2002, SAT® I Math and Verbal examinations are used. In these tests, the number of items 
ranges from 60 in the Math test to 78 in the Verbal test, and 446,607 individuals are 
examined. For simplicity, responses are taken as correct or other (incorrect or omitted). 
This approach is not entirely satisfactory to the extent that scoring distinguishes between 
incorrect and omitted responses except in the case of constructed responses. On the other 
hand, the very large number of observations permits an examination of the extent of model 
misfit that is not readily accomplished with fewer data, and the tests under study have the 
advantage of careful construction based on long experience. Later reports are expected 
to examine variations of the Rasch model in which incorrect responses are distinguished 
from omitted responses. Later reports are also expected to consider common extensions 
of the Rasch model in which item discrimination is not constant and to consider marginal 
estimation procedures in which the ability distribution is assumed to satisfy a parametric 
model. The choice of methods in this report reflects a desire to examine cases in which 
log-linear models are applied to directly observed data and in which the responses are as 
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simple as possible. As evident in this report, even the relatively simple situation under 
study in this report leads to substantial methodological problems. 

Section 1 examines joint maximum-likelihood estimation (JMLE) for binary responses 
(Andersen, 1972; Fischer, 1981; Haberman, 1977b). Known results are reviewed and 
considered for their implications for the SAT data, ft is established that, even if the model 
is true, JMLE will not lead to fully satisfactory approximate confidence intervals for item 
difficulties and that the normal approximation for the distribution of ability estimates will 
not be fully satisfactory. These results may not be evident from the author’s previous 
work, although no actual contradiction is involved (Haberman, 1977b). It is also shown 
that some problems will arise in use of normal approximations for estimates of the expected 
log penalty function (Gilula & Haberman, 1994; Gilnla & Haberman, 1995). Behavior of 
JMLE is established in cases in which model error is present, and possible use of generalized 
residuals (Haberman, 1978) is considered. Results for generalized residuals are found to be 
somewhat discouraging. 

Section 2 examines conditional maximum-likelihood estimation (CMLE) for binary 
responses (Andersen, 1972; Andersen, 1973a; Andersen, 1973b; Fischer, 1981). The basic 
properties of conditional maximum-likelihood estimates are reviewed, and computation 
with the Newton-Rapshon algorithm is described. It is shown that convolutions can be 
used to yield a version of the Newton-Raphson algorithm that is computationally efficient 
(Liou, 1994). It is also shown that appropriate starting values are readily obtained, at least 
for large numbers of items, by use of joint estimation. Normal approximations for estimates 
of item difficulty are established whether the model is true and/or whether the number 
of items increases, and satisfactory estimates are developed for the expected log penalty 
function. Satisfactory generalized residuals are determined, and a simple generalization of 
the Rasch model is developed that permits assessment of lack of fit for sample sizes and 
numbers of items encountered with the SAT I Math and Verbal examinations. 

Section 3 considers some implications of CMLE to latent-structure Rasch models 
(Cressie & Holland, 1983; Tjur, 1982). The difference between the log-linear model 
corresponding to the CMLE approach and the Rasch model is considered. It is shown 
that many latent-structure models can yield the same observed joint distribution of item 
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responses, and it is shown that the joint distribution can typically be produced by use of a 
latent-class model with approximates the number of latent classes equal to about half the 
number of items. 

Section 4 summarizes the implications of the research for psychometric practice and 
discusses some further areas of possible development. 


1 Joint Maximum-likelihood estimation 

To describe joint maximum-likelihood estimation, let examinees i from 1 to n > 2 
provide responses Y l} equal to 1 or 0 to items j from 1 to q > 2. Normally Y VJ is 1 for 
a correct response of subject i to item j, and Yij is 0 otherwise. Assume that associated 
with examinee i is a real ability parameter Oi. Let the Oi be independent and identically 
distributed random variables with distribution function D, so that D(x) is the probability 
that 9i < x for each real x. For simplicity, assume that, for some bounded real interval 0, 
6i is in 0 for each examinee i. Assume that the nq responses Y^, 1 < i < n, 1 < j < q, are 
conditionally independent given the n ability parameters Oi, 1 < i < n. Assume that the 
conditional distribution of response Y tJ for a particular examinee i and response j given the 
n ability parameters Oh, 1 < h < n, depends only on the ability parameter Oi for examinee 
i. Let the logistic distribution function lgt be defined on the real line so that 

lgt(a;) = [1 + exp(—a;)] -1 


for all real x. For each item j, let /3j be a fixed parameter that measures the difficulty of 
item j. Let (3 denote the (/-dimensional vector with coordinates (3j for 1 < j < q. Given 
that Oi = 6 for a real number 0 , let the conditional probability P(Yij = 11 6, = 6) that 
Y^ = 1 be 

lgt {0- f3j), 

so that the conditional log odds that Y l} = 1 given that Oi = 0 is 


P(Yjj = 1 | Oj = 0) 
P{Yij = 0 | 0, = 0) 


0 - f3j. 


The random variable 


Pij = lgt(6*i - Pj) 
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may be employed to characterize the conditional distribution of given 9^. To ensure 
identihability of the item difficulty parameters under the estimation procedures in this 
report, it is convenient to assume that (3\ = 0. In addition, assume that the number of 
observations n exceeds the number of items q. In large-sample results in which q increases, 
also assume that the empirical distribution of the (3j, 1 < j < q, converges weakly to the 
distribution of a bounded random variable (3*. Without loss of generality, f3* can be defined 
to be independent of the 9i and Y l3 . 

Under the proposed model, a marginal log likelihood function can be constructed with 
little difficulty, at least if computational considerations are ignored. Let Y, : denote the 
(/-dimensional vector with coordinates Y t] , 1 < j < q, so that each Yj is in the set T of 
(/-dimensional vectors with coordinates 0 or 1. Let c be in T, let Yi + = Y2i=i L ij be the 
number of items correctly answered by examinee i, and let k = 'Y2! l j =\ c j■ Let 

<? 

xT y = 

3 = 1 

for g-dimensional vectors x and y. Let 


* iM n;'=,[i+ex P («-ft)] 


for real 9. Given customary notation for a Lebesgue-Stieltjes integral, the probability that 
Y, = c is the expected value 


Pj{ c) = E ( exp{^[cj log p Xj + (1 - cj) log(l - p VJ )\} 

3 = 1 

exp (kOi — (3 t c) 


= E 


nLit 1 + exp(di - (3j)_ 


/ OO 

e ke ty((3,9)dD(9) 

-OO 


(Cressie & Holland, 1983). For p j equal to the array of pj( c) for c in T, the marginal log 
likelihood function is then 

n 

*(p j) = Y. log Pj(Yj). 

i =1 

For an alternative expression, let 

n 

Y + = E Y - 

i =1 
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For integers k, 0 < k < q, let T(k) be the subset of c in T such that Y^j=i c j = k- Under 
the constraint that 

/ OO 

exp (k9)^(/3, 9)dD(9) 

-OO 

for c G r (k) and 0 < k < q for some /3 with Pi — 0 and some distribution function D , 

n /‘OO 

^(Pj) = ~/3 T Y + + ^ log / exp (Y i+ 0)ty(P,0)dD(9). (1) 

i=\ 00 

Let £m be the maximum of £{p.j) subject to the constraints used in (1). Then {3j and Dj 
are joint unrestricted marginal maximum-likelihood estimates of /3 and D if 

^(p,/) = £-Mi 

the first coordinate fi\j of j3 is 0, and pj is dehned so that 

Pj{c) = exp(-/3 7 c) / e ke ^(pj,9)dDj(9) 

J — OO 

for c in T(A;) for 0 < k < q. 

Because determination of possible values of /3j and Dj is far from trivial, alternative 
approaches are commonly pursued. In this section, JMLE is considered. In Section 2, 
CMLE is developed. This report does not consider approaches in which parametric models 
are used for the distribution function D. It is planned that such approaches will be 
discussed in later reports. 

In JMLE, estimation is performed with the 0* regarded as fixed parameters, a practice 
with a long history of controversy in many areas of statistics (Kiefer & Wolfowitz, 1956). 
The joint log likelihood function 

n q 

p) = lo §Pij + (! - Y a) M 1 - Pa)] 

i =1 3 =1 

is maximized subject to the constraints that the array p of 


Pij = Igt (dij), 1 < i < n, 1 < j < q, 


( 2 ) 


satisfies 


Pij = 9i- Pj, 1 < i < n, 1 < j < q, 


( 3 ) 
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the 9 t are real for 1 < i < n, the /3j are real for 1 < j < q, and f3\ — 0. Under these 
constraints, 

n 

Zj{ P) = ~/3 i Y+ + J]log[exp(U + ^)'I'(/3,^)]. (4) 

i =1 

For any distribution function D and any (/-dimensional vector (3 with [5\ — 0, the maximum 
of exp{Y i+ 6)3>(l3, 9) over 6 is at least the integral 


exp (Y i+ 0)V((3,0)dD(9), 


with equality only if the maximum of exp (Y i+ 9)3/(/3, 6) is achieved by some 9, D(x ) = 0 for 
x < 6, and D(x) = 1 for x > 6 (Haberman, 1977b). Let £jm be the maximum value of 
£j( p) subject to the model constraints (2), (3), and (3\ = 0. Then the maximum £jm is at 
least £m- It is readily verified that £jm > £m unless each Y i+ has the same value. Thus 
joint maximum-likelihood differs somewhat from marginal maximum likelihood in terms of 
the function to be maximized. 

If it exists, then the joint maximum-likelihood estimate p of p is the array of joint 
maximum-likelihood estimates pij of Pij, 1 < i < n, 1 < j < q, that satisfies 


Pij = Igt (fMj), 

Pij — 0i— fa, 1 < i < n, 1 < j < q, 
§i is real for 1 < i < n, (3j is real for 1 < j < q, — 0, and 


( 5 ) 

( 6 ) 


£j{ p) = £jm- 


Let f3 be the q -dimensional vector with coordinates (ij. If 

n 

1=1 

then the maximum-likelihood equations 


q 

pi+ = ^2 pij = Y i+ 

3 =1 

and 

n 

P+J = 22 'Pp - Y +3 
i =1 


( 7 ) 

( 8 ) 
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are satisfied (Haberman, 1977b). If real 9[ and /3' exist such that f3[ = 0, 

/4 =- /?■> 

p'ij = ig t(/^y > 

q 

p i+ = Y.P'ij = Yi +> 

l=i 

and 

n 

P+j = J^Pij = Y +Y 

i =1 

then 0' = 9i and /?'■ = f3j, so that pC = pij and pC = Py. This result implies that, if they 
exist, joint maximum-likelihood estimates are uniquely defined. The case of 1 < h < % < n, 
= (3j, 9' g = 9 g for g not h or i, 9' h = 9 i: and 9\ = §h implies that 9 t = 9 h whenever 
> 7 + = Y h+ . 

If any Y l+ is 0 or q, so that examinee i answers no item correctly or answers all items 
correctly, then joint maximum-likelihood estimates cannot exist, for each pij must be 
positive and less than 1, so that 0 < pj + < q and pi + cannot satisfy the maximum-likelihood 
equation p i+ = Y i+ . This matter is not a purely academic issue. For the SAT Math data 
under study, one examinee had no correct response, and 646 examinees answered all items 
correctly. For the SAT Verbal exam, two examinees answered no item correctly, and 29 
examinees answered all items correctly. The issue of existence of joint maximum-likelihood 
estimates will be discussed more thoroughly after consideration of collapsed tables. 

1.1 Computations and Collapsed Tables 

Computation of joint maximum-likelihood estimates is greatly simplified by use of a 
collapsed table based on the counts fkj, 0 < k < q, 1 < j < q. Here fkj is the number of 
examinees i, 1 < i < n, of examinees i, 1 < i < n, with total number correct Y i+ = k with 
a correct response to item j (Y tl = 1). For integers k from 0 to q, let rq 0 be the number 
of examinees i with total number correct Y i+ = k. Let K be the set of integers k such 
that rik > 0, and let Nk be the number of elements of K, so that Nk < q + 1. If joint 
maximum-likelihood estimates exist, then no — n q — 0, so that Nk < g — 1. Consider a 
table with entries fkj, k in K, 1 < j < q. As shown in this section, computation of joint 
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maximum-likelihood estimates of 9 t and f3j is then based on a logit model in which the 
observations f k j, k in K, 1 < j < q, are regarded as independent binomial random variables 
with respective sample sizes n k and probabilities 

PkjC = lgt(4c — (3jc) 


for some real 9 k c, k in K , and /3jc , 1 < j < q, such that j3jc = 0. If joint maximum- 
likelihood estimates exist, then the maximum-likelihood estimate of /3jc exists and equals 
the joint maximum-likelihood estimate /3j, and the maximum-likelihood estimate of 9 k c, 
k in K, exists and equals the common value of 9i for Y l+ = k. The gain here is that 
computation for an n by q array of binary responses is reduced to computation for an Nk 
by q array of binomial responses. Because the number n of items is typically much larger 
than is the number q of items, the computational savings are very large. The collapsed 
table is also useful in verification that joint maximum-likelihood estimates exist at all. In 
cases of nonexistence, the collapsed table remains important in computations. 

To verify the relationship between joint maximum-likelihood estimation and the logit 
model for the f k j, some preliminary results concerning summations of the counts f k j are 
needed. Let the indicator variable I ik be 1 for Y i+ = k and 0 otherwise. Then 



It follows that the row summation 

fk+ 


f y 

3 =1 
q n 

'y'] Yijiik 
j =1 i =1 
n q 

lik Yij 
i= 1 3 = 1 

n 

k ''y ^ ilk 

i= 1 

kn k , 
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for 0 < k < q, and, for 1 < j < q, the column summation 

f+j = 22 f k i 

k =0 
q n 

= EEn*. 


k=0 i= 1 
n q 


= 2222 Iik 

i= 1 fc=0 

= y+i, 


the number of examinees i with response j correct {Y t] = 1). Because fkj — 0 if n k — 0, 

f+j — 22 f k i' 


k&I< 


For the logit model for the , k in K, 1 < j < q, the log likelihood function 


-JC 


(Pc) = 22 22 ^ l °SPkjC + (n k - fkj log(l - Pkjc)\ 

keK j= 1 


for arrays p c with elements PkjC, k in K, 1 < j < q, such that 

PkjC = Igt (Pkjc), 


PkjC — 9kC — PjC, 

9kc and /3jc are real, and f3±c = 0. Let Ijcm be the supremum of ijc- If kjc(pc) = £jcm 
for 

PkjC = Igt (Pkjc) (9) 

and 

PkjC = OkC — PjC, (10) 

if 9kc and (3jc are real, and if /3ic = 0, then the likelihood equations 


and 


22 n kPkjC = f+j = Y+j, 1 < j < q, (11) 

k£K 


q 

n/c ^ ^ PkjC fk+ kn k , 

3 =1 


k e K, 


( 12 ) 
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(13) 


must hold (Haberman, 1978, pp. 198, 294-295), so that 

Q 

^PkjC = k. 

3 =1 

If real 9' kC , k in K , and /3' c , 1 < j < q, exist such that j3' ic = 0, 

PkjC = ®kC ~~ PjCi 
PkjC = ^{pkjc)i 

Q 

PkjC = k , 

3 =1 

and 

n kP'kjC = y +M 

k&K 

then p' k - c = pkjC: O'kc = @kc, Pjc = PjC, and p! k ^ c = p k jc, so that the maximum-likelihood 
estimate of pc is uniquely defined. 

The relationship of the 9 k c and /3jc to joint maximum-likelihood estimates is 
straightforward. Suppose that the joint maximum-likelihood estimates §i, f3j, pij, and pij 
exist. Because 0* = 9/-, for Y i+ = Y k+ , the equations (9), (10), (11), and (13) that define 
PkjC, PkjCi @kCi and /3jc are satisfied and flic = 0 if 9 k c is the common value of for each 
examinee i with Y l+ = k, j3jc = (3j for each item j, fi^c is the common value of for 
Y l+ = k , and p k jc is the common value of p^ for Y l+ = k. Thus 9 k c, (3jC, PkjC , and p k jc 
are maximum-likelihood estimates for 9 k c, (3jc, Pkjc, and p k jc , respectively. Conversely, if 
9kCi PjC, Pkjc-, and p k jc are maximum-likelihood estimates for 9 k c, (3jc, pkjc, and p^c, 
respectively, so that (9), (10), (11), and (13) hold and fiic = 0, then (5), (6), (7), and (8) 
are satisfied and — 0 if 6 t — 9 kC for examinees i with Y l+ = k and f3j = (3jc for items j, 

so that the /3 ), 9 { , and p^ are joint maximum-likelihood estimates of (3j , 9^ pij, and p^, 
respectively. Thus joint maximum-likelihood estimates are readily found by maximization 
of ijc- 

1.2 Existence of Joint Maximum-likelihood Estimates 

As already noted, joint maximum-likelihood estimates do not exist if no or n q is 0, so 
that some examinee answers all items correctly or all items incorrectly. A necessary and 
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sufficient condition for existence is provided by the following theorem (Haberman, 1977b; 
Fischer, 1981). 

Theorem 1 Joint maximum-likelihood estimates exist if, and only if, there is no real a and 
b such that the following conditions hold: 

1. a and b are not integers, 

2. 0 < a < q and 0 < b < n, 

3- fkj = 0 for k < a and Y +J < b, 

4- fkj = kik for k > a and Y + j > b, 

5. k and j exist such that n*, > 0 and either k < a and Y + j < b or k > a and Y +3 > b. 

Several basic cases should be noted. If n 0 > 0, then the case a = 0.5 and b = q — 0.5 
shows that joint maximum-likelihood estimates do not exist, for Y +J < n — n 0 < b and 
foj = 0 for 1 < j < q. If ri q > 0, then the case a = q — 0.5 and b = 0.5 shows that joint 
maximum-likelihood estimates do not exist, for Y +3 > n q > b and f q3 = n q for 1 < j < q. 
Similar arguments show that joint maximum-likelihood estimates do not exist if Y +] is 0 
or n for some j from 1 to q. If no — n q — 0 and if to each j corresponds a k such that 
0 < fkj < n k and 1 < k < q — 1, then joint maximum-likelihood estimates exist. 

1.3 Extended Joint Maximum-likelihood Estimates 

The definition of joint maximum-likelihood estimates may be extended to yield a 
unique estimate p of the probability array p without any conditions (Haberman, 1974, pp. 
402-404). The resulting estimate, termed the extended joint maximum-likelihood estimate 
of p, is uniquely defined by the conditions that 

£j{ p) = Om, 

p has elements pij for 1 < i < n and 1 < j < q, and 9 iu , 1 < i <n, and real 1 < j < q, 
exist for v >1 such that (J\ v = 0 and lgt (9i U — (3j V ) approaches p^, 1 < i < n, 1 < j < q, as 
v approaches oo. The estimates p l3 satisfy (7) and (8), just as in the case of conventional 
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joint maximum-likelihood estimates. If the conventional joint maximum-likelihood 
estimate p of p is defined, then the definition implies that p is also the extended joint 
maximum-likelihood estimate of p. 

On the other hand, if 6' iin 1 < i < n, and /3' v , 1 < j < q, are real for v > 0, fi[ v = 0, 
lgt(#'„ — (3j V ) approaches pT for 1 < i < n, 1 < j < q, as v approaches oo, p' i+ = Y i+ , and 
P+j = y+j, then p\j = pij. 

The definition of p implies that 0 < p t] < 1, so that p iq = 0 if Y l+ = 0 or Y +] = 0 and 
Pij = 1 if Y i+ = q or Y +j = n. 

In terms of the collapsed table of counts fkj , k in K, 1 < j < q, extended 
maximum-likelihood estimates may also be defined. There is a unique pc such that 

^jc(pc) = Pjcm, 

Pc has elements pkjc f° r k in K and 1 < j < q, real dkCu, k in K, and (3jc u , 1 < j < q, are 
defined for u > 1 so that /3icu = 0 and lgt(f4cv — PjCv) approaches p^c as v approaches 
oo. The equations (11) and (13) hold. If it exists, the conventional maximum-likelihood 
estimate of pc is also the extended maximum-likelihood estimate. 

If real 9 r kCu , k in K , and /3j Cu , 1 < j < q, exist for u > 1 such that j3' lCu = 0 and 
lgt [9' kCv — PjCu) approaches p' k j C for k in K , 1 < j < q, as v approaches oo, if 

g 

J2PkjC = k, 
l=i 

and if 

n kPkjC ~ f+ji 

keK 

then p’ kjC = p kjC . 

The relationship between estimates is quite straightforward. Nearly the same arguments 
used for conventional joint maximum-likelihood estimates show that pij = p k jc if Y+ — k. 
It follows that pojc = 0 if no > 0 and p q jc = 1 if n q > 0. 

Extended maximum-likelihood is a bit more complicated when parameters 9i, 9 k c, (3j, 
(3jc, Pij, and p k jc are considered. For p iq and p k jc, a reasonable definition is available if 
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infinite values are permitted. One has 


PkjC ~ 1 , 

0 < PkjC < 1 , 
PkjC = 0, 


pij 


OO, 


PkjC ~ \ 


log ( . R '"' ) 

V l ~Pk]C J 


-oo, 


for Y i+ = k. Because 9 t = p,n and 9kc = Pkic, if also follows that 


0i = 0, 


kC 


OO, pklC — 1, 

‘ lo s(Ak)> 0<p tlc <l, 

-oo, pk 1 C = 0 


if Yi + = k. Conventions for (3j = /3jc are somewhat more complicated. If conventional joint 
maximum-likelihood estimates exist, then 


Pj @jC pij Pi 1 PkjC p \jC 


for Y i+ = k and k in K. This formula is applicable to extended joint maximum-likelihood 
estimation as long as some integer k in K exists such that pk\c and pkjc are not both 0 or 
both 1. For any examinee i with Y i+ = k and any k in K such that Pkic and p^c are not 
both 0 or both 1, the difference fikjc ~ Pkic has the same value, and this value may be 
assigned to / 3j = / 3jc■ This practice may lead to an estimate of oo or — oo. If no k in K 
exists such that pkjc and pkic are not both 0 or both 1, then there is no obvious basis for 
definition of / 3j = (3jc■ In this instance, the convention is adopted that / 3j = / 3jc = 0. 


1.4 Consistency 

Even if the Rasch model is valid, if the number q of items is constant, the f3j are 
constant, and n approaches 00 , then the 6i are not consistent estimates of the 0 t , and 
the / 3j are not consistent estimates of the / 3j (Andersen, 1973a, pp. 66-69). Indeed, 
the probability approaches 1 that ordinary joint maximum-likelihood estimates do not 
even exist (Haberman, 1977b). Even if extended joint maximum-likelihood estimates are 
employed, d* remains inconsistent for 9 %J and /3j remains inconsistent for j3j. As shown in 
this section, / 3j converges almost surely to a limit (3jM that differs from (i 3 by a term of 
order q -1 . 
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Consistency results are known to be quite different if the Rasch model holds and 
the number q of items increases (Haberman, 1977b). If g _1 logn approaches 0, then the 
probability approaches 1 that each f3j , and ji tJ is finite, and 

max Wi — 9A, 

1 <i<n 

max I Qj — 3A, 

i <j<q 

and 

max max In,;,- — uM 

l<i<nl<j<q J J 

all converge in probability to 0. Given that the function lgt has a bounded derivative, it 
also true that max,; max., | p i3 — Pij\ converges in probability to 0. These results are not fully 
satisfactory for the SAT I data under study. For the Math exam, q = 60, n = 446, 607, 
and q- 1 log n = 0.217. For the Verbal exam, q — 78 and n is still 446,607, so that q log n 
is 0.167. Neither value of q~ l \ogn is very small, so that the asymptotic results are not 
inconsistent with the already observed problem that not all 9{ are finite in either the Math 
or the Verbal exam. 

In this report, the consistency results previously derived are extended to new 
situations. It is shown that the ability estimate fi 3 for item j converges in probability to 
the corresponding ability f3j for item j whenever the number q of items approaches oo. 
Indeed max!<j< 9 \(3j — j3j \ converges in probability to 0. For any specific examinee i, 9i — 6i 
converges in probability to 0. In addition, weak convergence results for the distribution 
of 9i follow. For example, the empirical distribution function of the 9i converges to the 
common distribution function D of 0* at any continuity point of D. Consistency results are 
also demonstrated for estimates of expected logarithmic penalty functions associated with 
the Rasch model. 

To begin, consider the case of a fixed number q of items. The following theorems 
summarize the basic consistency problems for estimation of the examinee ability the logit 
Hij for examinee i and item j, and the probability p l3 of a correct response for examinee i 
and item j. 

Theorem 2 Let the number q of items be fixed, and let the number n of examinees approach 
oo. Then the probability that joint maximum-likelihood estimates exist approaches 0. 
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Proof. Let P*,, 0 < k < q, be the unconditional probability P[Y i+ = k) that Y l+ = k. 
Then each P fe is positive, so that the probability P 0 + P q is positive that examinee 
i has either no correct responses (Y l+ = 0) or all correct responses (Yj+ = q ). Joint 
maximum-likelihood estimates only can exist if 0 < Y\ + < q for each examinee i from 1 to 
n. The probability that 0 < Y i+ < q for 1 < i < n is [1 — (Po + Pfc)] n . As n approaches 0, 
this probability approaches 0. 

Theorem 3 Under the conditions of Theorem 2, for any integer i > 1, — 6i does not 

converge in probability to 0. For each integer i > 1 and any integer j, 1 < j < q. fi^ — p.ij 
does not converge in probability to 0. 

Proof. If Y i+ = 0, then 9i = —oo. If Y i+ = q , then 0i = oo. Because the probabilities P 0 
and P q defined in the proof of Theorem 2 are positive and constant and because 0* = — oo 
with probability at least Po and Qi = oo with probability at least P q , it follows that 0* — 0* 
does not converge in probability to 0. Virtually the same argument applies to /%, so that 
fiij — pij does not converge in probability to 0. It then follows that fij is 0 with probability 
at least P 0 , and p iq = 1 with probability at least P q , so that p iq — p iq does not converge in 
probability to 0. 

To demonstrate inconsistency of the (3j in the case of q fixed is relatively complicated 
in the case of extended joint maximum-likelihood estimates. Results depend on the 
expectation E(fkj ) of the count fkj for 0 < k < q and 1 < j < q. To find E(fkj), consider 
the conditional expectation mkjc = m kj(/3) of Y l3 given Y i+ = k. As is well-known, this 
expectation is a function of the vector (3 of item difficulties /3y, 1 < f < q, and not of the 
examinee ability 0*. To ford mkj(/3), let 

s*(/3) = ex P(~P T c) 

cer(fc) 

be the symmetric function of order k for the q variables exp (—/3jCj), 1 < j < q (Fischer, 
1981). Let 

Skj(fl) = ffi ex P(-/3 T c), 

cer(A;) 
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so that — Skj((3 ) is the partial derivative of s k (/3) with respect to / 3j . Let Y, denote the 
(/-dimensional vector of Y t] , 1 < j < q. Then the conditional probability that Y x = c in 
T(fc) given Y ]+ = k is 

Pjc(c) = exp (—{3 T c)/s k (f3), 


and 


rnkj(P) = E c iPJc( c) 

cer(fc) 


&kj (/ 3 ) 
Sfc(/3) ' 


For k — 0, m k j((3) is 0. For k — q, m k j((3) is 1. Otherwise, m k j((3 ) > 0. Given the 
probability P*, = P(Y] + = k) from the proof of Theorem 2 and the conditional probability 
rrikjc, if follows that E(n k ) = nPk and 


E(fkj) = E(n k )m kjC = nP k m kjC - 

The strong law of large numbers implies that f k j/n converges almost surely to Pk‘m k jc > 0. 

Given the conditional probabilities m k jc, 0 < k < q, and 1 < j < q, the probabilities 
Pfc, 0 < k < q, and the unconditional probabilities pj = P(Y\ 3 — 1), 1 < j < q, the basic 
limiting properties of 9 k c and /3j can be summarized as in the following theorem. 

Theorem 4 Under the conditions of Theorem 2, 9 k c converges almost surely to 9 k M, 0 < 
k < q, 9 0 c in K, and (3j converges almost surely to (3 3 m, 1 Y j < q, where 9 0 m = —oo, 
9 qM = oo, real 9 kM , 1 < k < q— 1, real (3jM, 1 Y j < q, and real p k jM . 0 < k < q, 1 < j < q, 
are uniquely determined by the conditions that p 0 j M = 0, p qj M — 1 ; PjM — b- 


PkjM = lgt (9 kM ~ P jM ), 1 < k < q - 1,1 < j < q, 


(14) 


and 


<? <? 

E PkPkjM = E PprikjC = p) , 1 < j < q, (15) 

k =0 k =0 

q <? 

E PkjM = E m kjC = k, 0 < k < q. (16) 

3 = 1 J =1 


Proof. Existence and uniqueness of 0 k M, (djM, and p kj M follows from standard results 
for log-linear models (Haberman, 1974, chap. 8). Results on almost sure convergence follow 
from general results on concave likelihood functions (Haberman, 1989). 
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To interpret the limit parameters d kM , PjM, and PkjM, logarithmic penalty functions 
may be employed (Gilula & Haberman, 1994; Gilula & Haberman, 1995). Let 

H(x, y ) = -x log (y) - (1 - x) log(l - y) 

for x and y between 0 and 1, where OlogO = 0. Consider probability prediction of the 
responses Yi from the sums Y 1+ under the incorrect model that, conditional on Y l+ = k, 
the Yij, 1 < j < q, are independently distributed with probability 


n kj = lgt(0fc - /3') 

that Y^ = 1 for unknown real parameters 9 k , 0 < k < q, and /3', 1 < j < q, /3j = 0. The 
expected logarithmic penalty per item is 

<? 9 

P k H(m-kjc, ^kj ) ■ 

k =0 j =1 

If OlogO = 0, then the minimum expected penalty per item is 


H J=<i 'EE P k H ( rrikjci PkjM )• 


k =0 j =1 


The expected penalty per observation approaches Hj if Q' k approaches 9 k M, 0 < k < q, and 
f3j approaches (3jM, 1 Y j < q. Theorem 4 implies that the estimated expected log penalty 
function per item 


Hj —- PjCM 

nq 


converges almost surely to Hj. 


Theorem 4 implies that inconsistency of f3j is observed when /3j and (3jm differ. This 
situation is typically but not necessarily the case. For instance, j3j = / 3jM if all are 0. On 
the other hand, it is rather unusual to have f3j = /3 jm for all items j. In the simplest case, 
q = 2, so that 

{ exp(-ffi) • I 

r+expOdd’ 3 

l+exp(—/3i)’ 3 2, 

Pijm = lgt(6*iM - Pjm) = m^P), 


PllM + Pl2M ~ 1- 
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It follows that 9\m = Pi and Pim = 2 p 1 (Andersen, 1973a). If more than two items are 
present, then no simple expressions are normally available for Pjm- 

The expected logarithmic penalty per item Hj is at least as large as the conditional 
entropy measure per item 

9-1 <? 

I hr = -g' 1 ^2 Pk '/2 H(rn kjC , m k jc) 

k=i j= i 

that corresponds to the conditional entropy per item of Y lR given Y i+ for a random variable 
B uniformly distributed on the integers 1 to q and independent of the Y tl , 1 < j < q. One 
has Hj = Hm if, and only if, p k jM = m-kjC■ The entropy per item Hm has an estimate 

1 q ~ l 

Hm ~hci ^ ^ u/ ( . II ( f k j / n k , f k j j n k ) 

that converges almost surely to Hm- In the definition of Hm, the convention is followed 
that 0/0 = 0. 

The magnitude of the difference between and PjM is of order 1 . For a formal 
statement and proof of this claim, consider the following theorem in which the number of 
items is allowed to increase. 

Theorem 5 A real number r > 0 exists such that \PjM — Pj\ < t /q for all q > 1 and all 
items j, 1 < j < q. 

Proof. To verify this claim, consider the difference between mkjc and p k jc = 
lgt(dfcc — Pj), 1 < k < q — 1, where p k jc is uniquely defined by the condition that 

<2 

P kjC = k 
j =1 

(Haberman, 1974, chap. 10). Let U k j be independent Bernoulli observations with 
probability p k jc that U k j = 1. The conditional probability m k jc that Y t j = 1 given that 
Y i+ = k is then the conditional probability that U k j = 1 given that 

<? 

U k+ = Y J = k. 

3 =1 
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This latter probability is then 


P{U kj = l)P(U k+ - U kj = k - l)/P{U k+ = k ). 


Let 

a kjC = Pkjc{ 1 - PkjC ) 

be the variance of U k j. If k, q, and n are selected so that 

<? 

2 _ \ ^ 2 

a k+C — 2-^ Gk jC 

3 = 1 


approaches oo, then 


( U k+ — k)/a k +c 


and 

(14+ - U kj - 1 +Pkjc))/(v 2 k+ c ~ a ljcY /2 

converge in distribution to a standard normal random variable (Cramer, 1946, pp. 
217-218). A refinement of this result permits approximation of m k jc■ To derive the desired 
approximations requires some simple modifications of results on Edgeworth expansions for 
lattice distributions (Esseen, 1945). Terms are used based on the normal density function 
and on its first three derivatives. Let 

^k = ~4 — ^(ZPkjc ~ 1 )ok jC , 

a k+C j 

so that —'^fc/o'fc+c is the skewness coefficient of U k+ . It then follows that 


°i+c[ m kjc ~ PkjC ~ (2 (7 2 k+c ) 'vljc&PkjC ~ 1 - 1 < 3 < Q 

is uniformly bounded. This result indicates that rri kj c — p k jc is °f order l/a k+c . 

To show that P 3 m — Pj is of order q requires use of fixed point theorems (Loomis & 
Sternberg, 1968, pp. 228-234). In applications in this paper, the maximum norm is used, 
so that a (/-dimensional vector x with coordinate Xj for 1 < j < q has maximum norm 


x = max a;,- 
1 <3<q 
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Consider solution of (15) for 2 < j < q subject to the constraints that fi\ — 0 and (14) and 
(16) hold. For each (/-dimensional vector x with coordinates Xj, 1 < j < q, there is a unique 

real value ^(x), 1 < k < q — 1, such that 

q 

^2\gt(g k (x) - xj) = k. 
i =i 

Let w be the real function with value 


w(y) = lgt(y)[l - lgt(y)] 


for real y, and let 

Wfcj(x) = w(g k (x) - xj). 

The function g k is infinitely differentiable, and the gradient Vg k of g k has coordinate j 
equal to 


9kj(x) = 


w kj (i 


ELi WkhW 

Let F(x, y) be defined for (/-dimensional vectors x and q — 1 by q arrays y with coordinates 
y k j for k from 1 to q — 1 and j from 1 to q so that F(x, y) has coordinate j equal to 


q -1 


^■( x >y) = 


x j) ykj)i 


k =1 


where 


Then 


TJk+ ^ ^ Ukj ■ 

3 =1 

F(/3, z) = 0 


for z kj = PkPkjC, and 


F (/3m, y 7 ) = 0 


for (3 m with coordinates / 3jM for 1 < j < q and for y' with coordinates y' kj = P k PkjM■ The 
function F is infinitely differentiable, ft is linear in the second argument. To evaluate the 
partial differential with respect to the first argument, let 5 be the Kronecker delta function. 
Then the partial gradient of Fj with respect to the first argument has coordinate j' equal to 

q -1 

F jj'( x ,y) = S y2k- 1 y k+ w k j(3t)[g k j'{K) - 5 j:i/ ], 

k= 1 
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so that the sum 


J2 F n'(*,y) = o. 

j '=i 

Let VF(x.y) be the q by q matrix with elements iq-y (x, y) for integers j and j' from 1 to q. 

Lise of hxed point theorems yields the desired conclusion, although some technical 
complications emerge in that the most efficient parameterization for these results uses 
parameters that sum to 0. The basic observation is that F(t) and F(/3) are the same if 

q 

7 j = Pj - <T 1 Pb- 

h=2 


Similarly, mjy(/3 ) = ( 7 ). In like manner, 7 M is defined with coordinates 

7 jM = PjM ~ q 1 PhM ■ 

h =2 


One has 

Pj = 7. i ~ 7i 


and 


PjM — ^jM ~ 7i m- 


Let O be the smallest value of 

q -1 

PkWkj(P)gkj'{P ) 

k =1 

for j and j' integers between 1 and q. Let 1 be the (/-dimensional vector with all coordinates 
equal to 1. One seeks x given y such that 


G(x, y) = F(x, y) - 011 T x = 0. 


The use of 0 leads to 7 as the unique solution of G(x, z) = 0 and leads to j M as the 
unique solution of G(x, y') = 0. Let 

VG(x,y) = VF(x,y)-L!ll T , 


and let 

D(x,y) =x - [VG( 7 ,y)]“ 1 G(x,y). 
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Consider an algorithm in which an initial value is 7 0 = 7 and subsequent values -y t are 
defined so that 

7t+i = D (7t,z). 

The result on fixed points that is required is that 

ht-lM\<C t S/(l-C) 

if C and 5 are positive constants such that C < 1 and 

|D(x,z) - D( 7 ,z)| < 5 | x - 7 | 

whenever |x — 7 I < C/{1 — 5). Use of standard albeit tedious arguments from calculus 
shows that the upper limit of q\~y M — 7 1 is finite. It then follows that the upper limit of 
q\(3 M ~ /3| is finite. The conclusion of the theorem then follows. 

Given the definitions of 0 kM and 6 k c an d the properties of g k , the proof of Theorem 5 
implies that q\O k M ~ 9kc\ and q\p k jM ~ Vkjc\ are bounded if q approaches 00 and k/q 
converges to a positive constant less than 1. More precise expressions for these differences 
can be obtained but are not especially attractive. 

The arguments used in Theorem 5 also can be applied to a variety of expected 
logarithmic penalties. To begin, consider the conditional entropy Hb of Y' B given Y 1+ and 
B for B uniformly distributed on the integers 1 to q and Y- Bernoulli random variables for 
1 < j < q such that P(Yj = l|Yi+ = k) = p k jc■ Then 

q -1 q 

Hb = — ^ P k ^ H ( p k jc, Pkjc ) 
k =1 j =1 

and Hm differ by a term of order q _1 . A variety of entropy measures are closely linked. 
The difference Hj — Hm is of order q ~ 2 , so that Hj — Hb is of order q -1 . With a similar 
argument based on the normal approximation for the distribution of Y\ + given 9±, it follows 
that Hj — Hg is of order q if 

He = -q- 1 i2 E (H(pi„Pi,)) 

3 =1 
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is the conditional entropy per item of Yx given 9\. In turn, if p* = lgt(#i — /3*), then Hg 
converges to 

H* = —E(H(p*,p*)), 

the conditional entropy of a random variable Y* with values 0 and 1 such that, conditional 
on 9 1 and /?*, the probability is lgt(6h — (3*) that Y* is 1. 

Because Y\ + has no more than q + 1 possible values, the conditional entropy per item 

q 

H+e = -cT 1 J>((P(Yi+ = A:|0 1 )logP(Yi + = k\9 t )) 

k= 0 

of Y\ + given 9{ and the unconditional entropy per item 

q 

H+ = -q^ 1 Y Pk log P k 

k= 0 

of Y l+ cannot exceed q~ l log(g + 1). It follows that the conditional entropy per item 

q _ 

H c = -q~' Y. P k PJc(c)logp JC (c) 

k =o cer(fc) 

= H 0 - H +0 

of Yi given Y\ + and 9\ differs from Hg by a term of order g“ 1 logg. The conditional 
distribution of Yi given Y\ + and 9\ is assumed independent of 9 1 , so that He is also the 
conditional entropy per item of Yi given Y\ + . It follows that He differs from Hb , Hj, and 
Hm by terms of order q log q. In turn, the unconditional entropy per item 

q 

Hu = -q^Yl YI pj ( c ) 1 o § pj ( c ) 

k =o cer(fc) 

= H c + H + 

of Yi differs from Hg, He, Hj , Hb , and Hm by terms of order q l logq. Thus Hu, Hg, He, 
Hj, H b , and H M all approach H* as the number of items q becomes large. 

Given that the bias /3jm — (3j is reduced as q increases, there is the suggestion that 
the inconsistency of the joint maximum likelihood estimators for the Rasch model can 
be removed if the asymptotic framework is changed so that both the sample size n and 
the number of items q both approach infinity (Haberman, 1977b). The following result is 
available. 


23 



Theorem 6 Let q approach oo, so that n approaches oo. Then \(3 — (3 M \ and \/3 — (3\ both 
converge in probability to 0. 

Proof. Consider solution of 

G(x, f) = 0 

for f with coordinates f k j • The previous argument in the proof of Theorem 5 based on 
fixed-point theorems is easily modified. The normal approximations for the sums 

q -1 

— n k m k jc) 

k =1 

and large-deviation arguments may be used to demonstrate that the probability approaches 
1 that \(3 — (3 m \ and \/3 — /3\ both converge in probability to 0. 

Under the conditions of Theorem 6, it also follows that d k M — Okc and p k jM — PkjC 
converge in probability to 0 if k/q converges to a positive constant less than 1. In turn, it 
follows that, for any specific individual i, 9 t converges in probability to 9j. Thus for any 
real 6 > 0, the fraction of examinees i < n with \9i — 9f > 5 converges in probability to 
0. This result permits estimation of the distribution of the random variable 9 t . Let D be 
the empirical distribution function of the 9i, so that D(x) is the fraction of the 9i that do 
not exceed the real number x. If D is continuous at x, then | D{x) — D(x)\ converges in 
probability to 0. The argument is essentially the same as one used to study convergence 
in distribution of sums of two random variables, one of which converges in probability to 
0 (Rao, 1973, pp. 122-123). Simple modification of the proof of the Hclly-Bray theorem 
implies that if h is a continuous or piecewise-continuous bounded function on the extended 
real line and h is continuous at 9\ with probability 1, then 

n 

E{h{e)) = n-'Y, h $i) 

i= 1 

converges in probability to E{h{9\)) (Rao, 1973, pp. 117-118). If the common distribution 
function D of the 9 t is continuous, as is the case for 9\ a continuous random variable, then 
similar arguments show that 

\D — D\ = sup | D{x) — D{x) | 
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converges in probability to 0. 

The difference Hj — Hjj then converges in probability to 0, so that the various 
conditional entropies under study can be estimated. The difference Hm — Hm can only be 
expected to converge in probability to 0 if q 2 jn approaches 0. For the SAT I data under 
study, Hj is not likely to be a very accurate estimate of the unconditional entropy Hu, 
for q~ 1 log q is 0.068 for the Math test and 0.056 for the Verbal test. The observed values 
of Hj is 0.450 for the Math exam and 0.501 for the Verbal exam. Thus the bias issue is 
potentially a major problem. 


1.5 Normal Approximations 

The bias issues already noted in the discussion of consistency have an unusual effect 
on normal approximations. It is relatively easy to find a normal approximation for the 
item difficulty estimate (3j, but this approximation is not really satisfactory because the 
asymptotic mean is /3jm rather than j3j. A normal approximation for the ability estimate 
6, is available with relatively little difficulty for q large, but there are problems in practice 
with the accuracy achieved. 

If q is constant and n becomes large, then a normal approximation is available for (3 but 
not for the 9 t . The normal approximation for /3 is somewhat different than the conventional 
approximation expected in a logit model, for the asymptotic mean is (3 M rather than /3, 
and the asymptotic covariance matrix is a relatively complicated expression. To define the 
required asymptotic covariance matrix requires a somewhat lengthy series of intermediate 
definitions. Let YY be the adjusted random variable with value Y Z] — pkjM for Y i+ = k. 
Let V + be the covariance matrix of the q -dimensional vector Y+ with coordinates Y-f for 
1 < j < q. Let VY, be row j and column j' of V + . Let 

a ljM = PkjM( 1 - PkjM ) 


be the variance of a Bernoulli random variable that is 1 with probability PkjM, let 

q -1 

a +jM = Pk^kjMi 


k =1 
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9 

2 _ 2 
a k+M — /_^ a kjMi 

3 = 1 

and let 

9-1 

2 _ a 2 2 / 2 

= a +jM°jj' — / J PkC r kiM a ki , M/ cr k+M- 
k= 1 


Let W be the q by g matrix with row j and column j' equal to ILL,-'. Note that 


5> 3i , = £vj = o, 

i'=i i=i 


and W and V + are symmetric and positive semi-definite. Let W be the Moore-Penrose 
inverse of W, so that 

WW = W W = I - g 1 !! 7 , 


ww~w = w, 

and 

rwr = w~, 

where I is the q by q identity matrix (Stewart, 1973, p. 326). Let K be the q by q matrix 
with row i and column j equal to 

0 , 9 = 1 , 

0 , i ~f~ j , 

1 , i = j> 1 , 

- 1 , j — 1 < i, 

so that (3 = K7y. Let K 2 be the transpose of K. 

Given the definitions of V + , W, and K. the following result can be derived. 



Theorem 7 Under the conditions of Theorem 2, n 1//2 (7 — ~f M ) converges in distribution to 
a multivariate normal random vector with mean equal to the q-dimensional zero vector 0 and 
covariance matrix W~V + W~, and n 1//2 (/3 — (3 M ) converges in distribution to a multivariate 
normal random vector with mean 0 mid covariance matrix KW'V + WT ] . 
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Proof. The normal approximation is derived by conventional arguments based on 
the function F developed in Section 1.4. Once again, fixed point theorems are employed. 
Details are omitted. 

It should be noted that Theorem 7 differs from customary results for maximum 
likelihood both in terms of the asymptotic mean and in terms of the asymptotic covariance 
matrix. Were the customary results to hold, n 1/l2 (/3 — j3) would converge in distribution 
to a multivariate normal random vector with mean 0 and covariance matrix KW K 7 
(Haberman, 1978, pp. 339-340). 

If the number q of items increases, then normal approximations remain available, but 
a few changes in results are needed due to the changing dimension of (3 and 7 and due to 
the behavior of V for large q. The normal approximations are somewhat unsatisfactory in 
many cases due to unconventional asymptotic mean. Let 

clj = E {w(d 1 -Pj)) 

be the variance of Y\j — E(Y\j\9\). Let r be an integer constant greater than 1. For q > r, 
let /3 r be the r-dimensional vector with coordinates fd 3 for 1 < j < r, let /3 rM be the 
r-dimensional vector with coordinates f3jM for 1 < j < r, and let (3 r be the r-dimensional 
vector with coordinates (3j for 1 < j < r. Let X(/3, r ) be the r by r matrix with row j and 
column j' equal to 

Sjji 0 j 1 T — 1 

<J + j <7 + i 

Then the following theorem may be proven. 

Theorem 8 If q approaches 00 , then n 1 // 2 (/3 r — /3 rM ) converges in distribution to a mul¬ 
tivariate normal random vector with mean the r-dimensional zero vector 0 and covariance 
matrix S(/3 r ). 

Proof. Let Vkjj’c = v kjj'(/3 ) be the conditional covariance of Y t j and Y x y given L) + = k, 
so that 

Vkjf(P) = - m kj(P)m kj f{3 ), 

s k\lJ) 
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where 


s kjf((3)= ^ CjCy exp(—/3 t c). 

cer(fc) 

Obviously Vkjjc > 0 for 0 < k < q. As shown in the appendix, Vkjj'c is negative if j ^ f 
and k is neither 0 nor q. As in the case of rrikjc, if k, q, and n are selected so that <rl +c 
approaches oo, then 


<y 


k+C 


2 , (ZPkjC - l)al jC (2p kjC - 1 - -0fe) 

VkjjC ~ v kj c + - 


2a l+c 


, 1 < j < q, 


and 


a 


k+C 


2 2 

. a kjC a kj'C 
V jj'kC + ~Z2 - 


a 


k+C 


,2 <j<j'<q, 


are uniformly bounded. 

Use of the maximum norm shows that n 1/2 \(3 — (3 M — Z| converges in probability to 0 if 


Z = KM, 

M = W T. 

and T is the (/-dimensional vector with coordinates 

q -1 

T 3 = n ~ l y ^(fkj ~ n k m kjc)- 

k= 1 

Indeed, use of large-deviation theory shows that |(3 — (3 M — Z| is of order n _1 logg 
(Haberman, 1977b). Because the sum of the coordinates of T is 0, if iV is the smallest value 

° f Efc=l P k<7k jM <7kj'M/<7k+M> then 

M = (W + U'll 7 )' 1 !?. 

Elementary albeit somewhat tedious calculations show that the variance of n 1 A ( /(J 2 +] ) 
approaches 0. Thus 

n 1/2 (ft - - Tj+% + Tj/crJ,) 

converges in probability to 0, so that the conclusion of the theorem follows. 

Theorem 8 is somewhat similar but not identical to the standard normal approximation 
for maximum-likelihood estimates. The asymptotic mean of (3 r would be expected to be 
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/3 r , and the asymptotic covariance matrix would be the limit of the submatrix of KW _ K J 
formed from the first r rows and r columns. A slight change in the calculation of the 
variance of n l ^ 2 {Mj — T 3 f (J 2 +J ) can be used to verify that the limit of the submatrix is 
£(/3, r ). Thus the major issue is the difference between (3 M and {3. 

In practice, the asymptotic normality result is somewhat unsatisfactory. Clearly /3j is 
intended to estimate (3j rather than /3j M . If n/q 2 approaches 0, then standard results hold, 
so that n 1 / 2 (/3 r — (3 r ) converges in distribution to a multivariate normal random vector 
with mean 0 and covariance matrix S(/3, r ). This result is a little stronger than is found 
in the literature (Haberman, 1977b). Nonetheless, the asymptotic approximation is not 
very helpful for an SAT I exam with q of 60 or 78 and n = 446,607. Obviously, n/q 2 is 
too large. As a practical matter, the results indicate that ordinary asymptotic confidence 
intervals for /3j cannot be derived by use of the normal approximation for f3j. Nonetheless, 
it should be emphasized that some estimation gain is achieved if the sample size n is large. 
Let 0 < a < 1 and let z be defined so that the probability is a and the absolute value of a 
standard normal deviate exceeds z. Let 



1 1 

-1-• 

a +j a +i 


For j > 1, the probability approaches 1 — a that 


PjM - Pj ~ n 1/2 za0j) < fy - pj < p jM - Pj + n 1/2 zcr0j). 


For a given number q of items, the bounds become more narrow as the sample size n 
becomes larger. 

In the case of an individual i for an increasing number q of items, the normal 
approximation for Q{ is relatively straightforward. Here 

— @kM = 9k{P) 


if Y l+ = k. Let 


°ij = Pi A 1 - Pij ) 


be the variance of Y t] given 9i, let 


<? 



3 =1 
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be the variance of Y i+ given 6 t , and let 

cr{0i) = 1 /a i+ . 

Then it is a fairly straightforward matter to verify that d* — 0; — (Y t+ — p i+ )/cr i+ ) is of order 
q _1 . ft follows that (Q l — 9i)/cr(9i) converges in distribution to a standard normal random 
variable. In addition, for r a finite integer, the r-dimensional vector with coordinates 
{9i — 9i) / cr{9i) for 1 < i < r converges in distribution to a multivariate normal random 
vector with mean 0 and covariance matrix I. 

Given the assumptions on the empirical distribution of the /3j , the asymptotic standard 
deviation a(9i ) is readily shown to satisfy the condition that q 1 / 2 cr(9 i ) converges to 
[1 /E{w{6i — /3*))] 1 / 2 for each i. 

In this case, the normal approximation for d* does hold, so that approximate confidence 
intervals are available. The probability that 

§i - za(§i) < 9i < 9i + za{9i) 

approaches 1 — a if 

cr(9i) = l/<7;+, 

3 =1 

and 

*ii = Pi A 1 - Pij)- 

In practice, the normal approximation suggests limits on accuracy of estimation. Note 
that afj < |, so that a(0i ) > 2/g 1 / 2 . For a test with q — 60 items, cr(9i ) > 0.258, while for a 
test with q = 78 items, cr(9i) > 0.226. For the SAT I Math, the observed estimates a(9i) are 
all at least 0.309 and are greater than 1 in some instances in which 9, is finite. In the case 
of the Verbal exam, the minimum estimated standard deviation is 0.259 and the maximum 
for 9, finite is greater than 1. These large estimates also suggest limitations in the quality 
of the normal approximation. 

Some obvious limitations are evident for the normal approximation for the 9 % . Observe 
that, for 1 < i < k < q, the probability that 9, and 9k are unequal is the probability 
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that Y i+ ^ Y k+ . Thus the probability that, for randomly chosen i and k, 6 t ^ 6 k is the 
probability that Y i+ ^ Y 2+ . This probability is the Gini concentration 

i -± p l 

k= 0 

of Y 1+ (Gini, 1912). An unbiased estimate of this measure is 

k =0 

In the SAT I Math, this estimate is 0.976, while for the Verbal examination, the estimate is 
0.980. In contrast, if each 0 t had an independent normal distribution, then the probability 
that §i ^ 9 k would be 1 for i ^ k. 

Normal approximations for Hj and H M are somewhat unsatisfactory in practice due to 
the relative large estimation biases involved. 

1.6 Generalized Residuals 

Generalized residuals based on JMLE can be considered for the Rasch model as a tool 
for detection of model deficiencies (Haberman, 1978, chap. 5). Several possible approaches 
exist, but none is very satisfactory. For a simple possibility, consider fixed constants dij 
for 1 < i < n and 1 < j < q. Assume that is not additive in i and j, so that no «*, 

1 < i < n, and bj, 1 < j < q, exist such that d l} — ai + bj for 1 < i < n and 1 < j < q. The 
raw generalized residual corresponding to the d^ is 

e ^ ^ ^ ^ dij(Yij Pij) O E , 

i 3 

where 

O = Z! dijY i; j 

i 3 

and 

e = dijPij 

i 3 

is the estimated expected value of O. The adjusted generalized residual is 

z = e/<x(e), 
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where <x 2 (e) is the minimum of 


X X & U d H ~ M ' 2 

i 3 

for real a* and bj such that b\ = 0. 

For a very simple example, consider a check of whether a set A of N A examinees has 
unusual behavior on a set B of Nb items. A possible choice is d^ = 1 for i in A and j in 
B and d^ = 0 otherwise. If N A /n is very small and is then b 2 (e) is well 

approximated by 

v- 0iA(*i+ - d h) 

i€A *+ 

In customary asymptotic theory, z has an approximate standard normal distribution if 
the model holds and if 

[er(e)] 1 max max \ d ij I 

i 3 

approaches 0, where a 2 (e) is the minimum of 

X! X a U d n - a ' 

i 3 

for real a* and bj such that b\ — 0 (Haberman, 1978). In the context of joint estimation, one 
must consider the case in which the sample size n and the number of items q both approach 
cx) and the dij depend on n and q. It is clearly necessary that the number of nonzero dij 
approach oo. 

Unfortunately, the requirement on the number of nonzero d i3 creates a problem 
with the linear approximations to the joint maximum-likelihood estimates on which the 
generalized residuals are based. Because PkjM — PkjC is of order q -1 , the difference f) t j — pij 
cannot be expected to be of order less than g -1 even if the sample size is quite large. With 
some calculation that exploits the fact that p i3 = wijAj) for the bounded and continuous 
function w defined in Section 1.4, it is possible to show that z converges in distribution to 
a standard normal random variable if q~ 1 u/a(e) approaches 0, where u is the minimum of 
Y^i=i i \ d ij ~~ a i ~ bj | for a,i and bj real. This condition is quite restrictive in practice. 

For example, recall the case with the set A of examinees and the set B of items. Then 
the condition on q~ l u/v only holds if N a Nb/ q 2 approaches 0. Nonetheless, one rather 
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simple application is of some interest. For the SAT I data under study, consider a Rasch 
model in which the Verbal and Math tests are combined together into a test with 138 items 
Let the last 60 items come from the Math test. For individual h, consider the number of 
correct responses on the Math test. In this fashion, one might consider dij — 1 for % — h 
and j > 79 and d VJ = 0 otherwise. Thus 

138 

o = E y « 

j= 79 


and 


138 

E = ^Pij- 

j= 79 

Because the number n of items is very large, a(e) is nearly equal to d‘f M af v /d 2 i+ if 


°iv = 


78 

V 

3 = 1 


(Ji 


®iM — 'y 1 &ij- 

3=79 

Although concerns about the accuracy of the normal approximation are clearly in 
order if tr(e) is relatively small, it is noteworthy that this form of residual analysis is quite 
adequate to indicate severe problems with the combined Rasch model. Some 144,082 
examinees exist with generalized residuals that exceed 2 in magnitude out of 446,603 
examinees with a positive value of v. If the Rasch model really held, then the normal 
approximation implies that only about 23,000 examinees would be expected to have 
generalized residuals that exceed 2 in absolute value. Of course, no reasonable person 
would expect a combined Rasch model to apply to tests as different as the Verbal and Math 
tests. Nonetheless, it is useful to note that analysis of generalized residuals for individual 
examinees can detect this problem in a substantial fraction of all examinees. 


1.7 Model Error 

Some further complications arise when the model is not simply assumed to be true. 
There does not appear to be any treatment of this case in the literature, but existing 
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arguments can be modified easily in this case (Haberman, 1977b; Haberman, 1989; Gilula 
& Haberman, 1994; Gilula & Haberman, 1995). 

Consider the case in which the Y tJ need not satisfy the Rasch model. Assume that the 
Y,; are independent and identically distributed and the probability pj( c) that Y* = c is 
positive for each (/-dimensional vector c with each coordinate 0 or 1. Let Pk still be the 
probability that Y 1+ = k, and let rrikjc be the conditional probability that Y\j = 1 given 
that Y 1+ — k for 1 < j < q and 0 < k < q. Note that for k = 0, mkjc = 0, while for k = q, 
rrikjc = 1- Let be the probability that Y\j = 1. In this case, the same arguments used to 
verify existence of joint maximum-likelihood estimates also imply the existence of OkM and 
Pjm such that /3im = 0, 

PkjM = lgt (OkM — PjAt); 

<7 <7 

^ y PkjM ^ ^ rnkjC kj 
1=1 1=1 

and 

< 7-1 < 7-1 

E P kPkjM = E P k' m kjC = pj - Pq■ 

k= 1 k= 1 

One may define Hj and Hm as in the case in which the Rasch model is valid. 

For q fixed, it remains true that f3j converges almost surely to /3jM and OkM converges 
almost surely to OkM- For asymptotic normality, define V + and W as in the case of the 
Rasch model correct. Then it is a relatively straightforward matter to demonstrate that 
n lp (f3 — /3 m ) converges in distribution to a multivariate normal random vector with mean 
0 and covariance matrix KW'V + W K ] . It remains true that Hj converges almost surely 
to Hj and Hm converges almost surely to Hm- 

The results for q fixed are readily generalized to the case of q increasing by use of 
arguments similar to those required for the case in which the model is correct. Wording 
of results must be modified to some degree because (3jM depends on q. It is simplest to 
assume that the (3j M are uniformly bounded as q increases, as is the case if the Rasch model 
is valid. To avoid somewhat pathological cases, it is also helpful to assume, as is the case 
for the Rasch model, that positive constants au and «2 exist such that, for any q > 1, 

<Ti rr ~i | rri 

a.1 X X < X V X < Ct2X X 
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whenever Yl 9 j=i x j = 0- It remains true that \(3 — (3 M \ converges in probability to 0. 
Asymptotic normality results are also available, although some simplifications do not apply. 
Let (7g (/3j) be row j and column j of of the matrix KWAW K 1 . Then the standardized 
value n}/ 2 ((3j — (3jc)/& e {$j) converges in distribution to a standard normal deviate. 

On the whole, use of JMLE appears to have an intermediate status. Parameter 
estimation is feasible to the extent that the j3j and 9 t do approximate the quantities 
they estimate under realistic testing conditions if the Rasch model holds. On the other 
hand, severe limitations exist on basic tools for statistical inference such as approximate 
confidence intervals and generalized residuals. It appears difficult to advocate use of JMLE 
except for preliminary estimation of parameters. 


2 Conditional Maximum Likelihood 


Conditional maximum-likelihood estimation is applicable to the Rasch model 
(Andersen, 1973a). In this case, inference is conditional on the observed examinee sums 
Y i+ . For c in T(k) and for 0 < k < q, the conditional probability pjcis) that Y* = c given 
that Y i+ = k satisfies 

Pjc( c) = pj(c)/P k . 


Under the Rasch model, 

/ OO 

e ke V(P,9)dD(6), 

-oo 

so that 


Pjc{ c) = exp(—/3 r c)/ s k ({3) (17) 

does not depend on the distribution function D of the ability 9\. The conditional log 
likelihood function is then 

n 

^c(pjc) = ^logpjc(Y ? :) 

1=1 

for the array p jc of pjc(c) for c in T. Thus 

<? 

tc(P) = -/3 t Y + - ^n fc logs fc (/3). 

k =0 


Because 


Y +i = X)/«, 

k =0 
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Pc{/3) is determined by the f k j, and inferences again may be based on the collapsed table. 
The relationship of conditional and marginal maximum likelihood is relatively simple. Let 
P be the vector with coordinates P k for 0 < k < q, and let 

<? 

e s (P) = X rik log 1\ 

k= 0 

be the marginal log likelihood for the examinee totals Y l+1 1 < i < n, under the unrestricted 
model that Y i+ = k with probability P k for some nonnegative P k such that Ylk=o Pfc = L 
Then 

P(Pj) = Mpjc) + 4(P)- 

Let icM be the maximum of Pc(Pjc) under the constraint that (17) holds for some [3 with 
— 0. Let £sm be the maximum 

Q 

Y n k log (n k /n) 

k= 0 

of£ 5 (P). Then 

Pm < Pcm + P-sm ■ 

As discussed in Section 3, if is commonly true that Pm is the sum of Pcm + Psm- 
The conditional maximum-likelihood estimate j3 c , if it exists, satisfies (3\c = 0, 

~ T 

Pjc{ c) = exp(-/3 c c)/s fc (/3 c ) 
for c in T(k) and 0 < k < q, and 

Pcip.ic) = Pcm- 

The notable feature here is that the conditional log-likelihood function does not involve the 
common distribution function D of the examinee abilities 0$. If (3 C exists, then it satisfies 
the conditional maximum-likelihood equations 

PklkjC kfl k j iftc) 

and 

1 

^ ^ ^U^ kj(' 1 +j 
k= 0 
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for 1 < j < q. Conversely, if (3' c satisfies the conditions that f3' ic = 0 and 

Y n k m kj (P' c ) = Y +j , 

k 

then (3' c is a conditional maximum-likelihood estimate of /3. Provided that no + n q < n, no 
more than one conditional maximum-likelihood estimate (3 C exists. If no + n q is n, then 
£c(/3) is constant, so that any (/-dimensional vector with hrst coordinate 0 is a conditional 
maximum-likelihood estimate. In this instance, the arbitrary choice of fijc = 0 for all j may 
be made. 

Existence of conditional maximum-likelihood estimates is an issue, although normally 
a much less important one than in the case of joint estimation. Let N( c), c in T(/c), 

0 < k < q, be the number of examinees i with Y t j = Cj for each item j. To hnd existence 
conditions for f3 c for the case of no + n q < n, let 5 a b be the Kronecker 5 function with 
6 a b = 1 for a = b and 6 a b = 0 for a ^ b. The following theorem is then available. 

Theorem 9 The conditional maximum likelihood estimate (3 C fails to exist if, and only if, 
ctj, 1 Y j < q, and 0 < k < q, can be found such that the following conditions hold: 

1. If k is in K, 0 < k < q, and c is in T(k), then 

r(c) = XXbW + T ,k < 0. 

j 

2. For some integer k in K such that 0 < k < q and some c in T(k), r(c) < 0. 

3. The product N{c)t{c) = 0 for all c in T(k) and k in K such that 0 < k < q 

(Haberman, 1974, chap. 2). Obviously, /3 C must exist if 7V(c) > 0 for each c in T(k) 
and each k in K such that 0 < k < q. An equivalent result is available with a somewhat 
different appearance. 

Theorem 10 The conditional maximum likelihood estimate j3 c exists if, and only if, there 
is no real a and b such that the following conditions hold: 

1. a and b are not integers, 
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2. 1 < a < q — 1 and n 0 < b < n — n 0 


3. fkj = 0 for k < a and Y + j < b, 

4- fkj = nk for k > a and Y + j > b, 

5. k in K and j exist such that riy. > 0 and either 0 < k < a and Y + j < b or a < k < q 
and Y + j > b. 

Proof. A simple modification of existing proofs for JMLE is required (Haberman, 
1977b, pp. 821-822). 

Existence results presented here appear to be consistent with but simpler than those 
previously available (Fischer, 1981). 

The fundamental change relative to JMLE is that no and n q need not be 0. Instead of 
the requirement that Y +] not be 0 in order for joint maximum-likelihood estimates to exist, 
it is now necessary that Y + j exceeds no but is less than n — Hq. It is important to observe 
that conditional maximum-likelihood estimates exist if unconditional maximum-likelihood 
estimates exist. 

Extended conditional maximum-likelihood estimates may be considered if f3 c does not 
exist. An array p.jc of extended conditional maximum-likelihood estimates pjc(c) of pj( c), 
c in T, exists such that, whenever £c(p.rc ) approaches for P jc such that (17) holds 
and Pi = 0, Pjc(c ) approaches pjc( c) for c in T(k) and either k in K or k equal 0 or q. 
The convention is adopted that pjc( c ) = k\(q — k)\/q\ if n 0 + n q = n. If j3 c exists, then the 
new definition reduces to the conventional definition of pjc( c). If rhkjc is the sum of pjc( c) 
for c in T(k) with Cj = 1, then 

q 

^ ^ OkkhkjC ^ +j 
k =0 

and 

q 

rn kjC = 1 

3 =1 

for k in K. If the conditional maximum-likelihood estimate /3 C exists, then rhkjc — m k](Pc) 
for k in K. Various conventions can be considered to define (3 C in the case in which no 
conditional maximum-likelihood estimate exists for (3. 
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Given the estimates / 3jc , it is possible to estimate the examinee abilities 9 t . For each i, 
the log likelihood for 9, given the / 3jc is 

^{Yij loglgt(0j - $ jC ) + (1 - Yij) log[l - lgt(6>j - foe)]}- 

3 =i 

The maximum is achieved by solving the equation 


Pi+c = Y i+ 


for 

Pije = Igt (Oic - foe) 

and 

Q 

Pi+C = YjijC. 

3 = 1 

If Y i+ = q, 9 iC = oo, while if Y i+ = 0, 9 iC = -oo. 

2.1 Large-sample Properties 

For q fixed, if the Rasch model is valid and n becomes large, then 0 is a consistent and 
asymptotically normal estimate for /3 (Andersen, 1973a; Haberman, 1977a). The argument 
in the second citation permits generalization to the case of q increasing. Let 

q -1 

Vjj'C Yjj'(/3) ^ 

k =1 

be the conditional covariance of Y +] and Y +] > given for 0 < k < q, so that Vjyc < 0 if 
j j' and if n 0 + n q < n and V 3] c is positive for n 0 + n g < n. Let Vp = V(/3) be the q 
by q matrix with row j and column j 1 equal to Vjy (/3) for 1 < j < q and 1 < j’ < q. This 
matrix is of rank q — 1 if no + n q < n. Let V* be the expected value of n _1 Vc, so that V* 
is obtained from by substitution of P*. for rq,. Note that if Y* is the random variable 
equal to Y iq — m^jc f° r Y i+ = k and if Y* is the vector with coordinates Y* for 1 < j < q, 
then V* is the covariance matrix of Y* for each observation i. Arguments rather similar 
to those applied in the case of joint maximum-likelihood estimation may also be applied 
to conditional maximum-likelihood estimation. If the number q of items is fixed, then 0 C 
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converges almost surely to /3 and n 1//2 (/ 3 C — (3) converges in distribution to a multivariate 
normal random variable with mean 0 and covariance matrix KV^K 1 . If q approaches oo, 
then maxi<j< 5 | (3jc — (3j | converges in probability to 0. For an integer r > 1, let (3 rC be the 
r-dimensional vector with coordinates (3j for 1 < j < r, and let (3 r be the r-dimensional 
vector with coordinates / 3j for 1 < j < r. Then n 1 ^ 2 {f3 rC — (3 r ) converges in distribution 
to a multivariate normal random vector with mean 0 and the covariance matrix E(/3 r ) 
encountered in the discussion of the normal approximation for n 1/,2 (/3 r — (3 rM ). The gain 
for conditional estimation is quite major, for the asymptotic approximations involve the 
actual parameters of interest, namely the /3j, rather than the /3jM parameters. As in JMLE, 
it should be noted that X(/3 r ) is the limit of the matrix formed from the first r rows and 
columns of KV^K 1 . Let V be V(/3 C ). Then both for q fixed and q increasing, asymptotic 
confidence intervals for parameters such as j3j are easily constructed by estimation of the 
asymptotic standard deviation s0jc ) of f3jc by the square root of the jth row and jth 
column of KV K 1 . 

If q approaches oo, then the asymptotic properties of 9ic are essentially the same 
as those for 9, as far as consistency, asymptotic normality, and approximate confidence 
intervals are concerned. Estimation of the distribution of 9\ can be implemented in 
essentially the same fashion as in JMLE by substitution of 6ic for 6 t . 

Estimation of the entropy measures He and Hjj involves relatively little difficulty, for 
He may be estimated by 

Hcr =-few, 

nq 

H + may be estimated by 

H+ =-fsM, 

nq 

and Hjj may be estimated by 

Hjjr = Hcr + H + . 

For q constant, Hcr converges almost surely to He, H+ converges almost surely to H + , and 
Hjjr converges almost surely to Hjj. For q increasing, Hcr — He, H+ — H + , and Hjjr — Hu 
all converge in probability to 0. Normal approximations are readily available, at least if q/n 
approaches 0. Let cr(Hu) be the standard deviation of g -1 logpj(Yi), and let cr(Hc) be 
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the standard deviation of g” 1 logpjc(Yi). For a fixed number q of items, n 1 / 2 ( Hur — Hu) 
converges in distribution to a normal random variable with mean 0 and variance a 2 (Hur), 
and n 1 / 2 (HcR — He) converges in distribution to a normal random variable with mean 0 
and variance <j 2 (Hcr)- For Q increasing, one may exploit the conditional independence of 
the Yjj given 0 t to examine the distribution of the average 

g 

q 1 logpj(Yi) = q- 1 H ( Y ij,Pij) 

3=1 

given 6j and the distribution of Y i+ given 0*. One may demonstrate that both g(H C r ) and 
g(Hur) converge to the standard deviation r of k(6i), where 

n(t) = E(H(\gt(t - (5*), lgt (t - p*))). 

In this case, both n 1 / 2 ( Hur — Hu) and v}^ 2 (Hcr — He) converge in distribution to a 
normal random variable with mean 0 and variance r 2 . These results are readily applied 
to construction of approximate confidence intervals for He and Hu (Gilula & Haberman, 
1995). It is sometimes worth noting that 

G 2 (H Lr ) = <T 2 /3 r V*/3 + cr 2 (^) 


for a random variable z with value 

ql 1 ^2^3 m kjC + log [s k ((3)/P k )] 

\ 3=1 

for Y l+ = k. Estimation of a 2 (Hu) and a 2 (He) is straightforward. Let 

Pj( c) = p JC (c)n k /n 


for c in r (k), and let 


and 


Pja — Pjc( Y i) 

Pm =pj(Yi). 


Then a 2 (H v ) m ay be estimated by 

n 


° 2 (Hur) = n 1 1 _ Hur?, 

1=1 
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and a 2 (He) may be estimated by 

n 

v 2 {H cr ) = n~ l ^[-(T 1 log pja ~ H C r} 2 . 

i =1 

For comparisons, it is often useful to consider entropy measures under the assumptions 
that fij = 0, so that item difficulties are all the same and under the assumptions that 6 l is 
constant, so that examinees all have the same ability. Under the model of constant item 
difficulties, the conditional probability pjc( c) must be k\(q — k)\/q\ for c in r (k), so that 
the expected log penalty per item 

g 

H c a = g ” 1 Y Pk lo S 

k =0 

for Yi given Y i+ and the expected logarithmic penalty per item for Y i+ is 


g| 

k\(q — k)\ 


Hu a — H C a + H + . 


The obvious estimates are 


H c a = ( nq ) 1 ^ n k log 

fc =0 


k\{q - /.')! 


and 


Hu a — H c a + H + , 


respectively. Normal approximations and approximate confidence intervals are easily 
obtained (Gilula & Haberman, 1994; Gilula & Haberman, 1995). For the case of no ability 
effects, the Rasch model then is equivalent to the model so that the Y tl are independently 
distributed. In this case, the expected logarithmic penalty per item for Y i+ is 

Hui = q-^HiPj^j )• 
l=i 


The obvious estimate is 


g 

H m = (T 1 H (f+i / n > f+j/n). 


3 = 1 
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2.2 The Newton-Raphson Algorithm 

The Newton-Raphson algorithm is readily applied to computation of (3 C when the 
ordinary conditional maximum-likelihood estimate exists (Andersen, 1972). One begins 
with a preliminary approximation (3 0 to (3 C , typically (3. One then uses the iterations 

0t+i =0, ~ K[V(/3,)]-[f + - m + (/3,)], 


where f + is the (/-dimensional vector with elements 

q -1 

f+j — fkj — b +j — n q 

k= 1 

and m + (/3 t ) is the g-dimensional vector with elements 

q -1 

™+j{Pt) = J ~2ri k m kj c((3t )• 

k= 1 


In typical cases, (3 t converges quite rapidly to (3 C . 

This algorithm has traditionally been difficult to apply for large values of g; however, 
considerable simplification in computations may be achieved by exploitation of the random 
variables U k j previously used to study m k jc = rn k j(/3 ) and uy/yc = v kjj'{P)- Recall that, 
for 1 < k < q — 1 , 


m k jC 


PkjcP(Uk+ Pkj k 1 ) 
P(U k+ = k ) 


Note also that 


Vkjjc = m kjC - m 2 kjC 


and 


v kjj'C 


PkjCPkj'C P (Uk+ Pkj Pkj' k 2) 

“ P(u k+ = k) 


mkjc^kj'c 


for j 7 ^ f. Given /3, computation of 9 k c in the definition of p k jc may be accomplished 
by use of the Newton-Raphson algorithm customarily used with maximum-likelihood 
estimation in log-linear models (Haberman, 1978). For an initial approximation 6 k o of 9 k c , 
one uses the iteration 


9k(t.+ 1 ) 


p k ~ E?=i - Pi) 
EJ=i w (9 kt — fij) 
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The 9 k t normally converge rapidly to 9 k c- There is no need for a precise approximation to 
6kc■ Given 9 kC , p k jc is easily computed. At this point, computations reduce to the problem 
of finding the probability that the sum of independent Bernoulli variables has a specified 
value. A simple recursion formula is quite adequate for this purpose. It suffices to consider 
P(U k+ = k). Arguments for U k+ — U k j , the sum of U k j' for j' ^ j are essentially the same. 
Similar remarks apply to U k + — U k j — U k y for j ^ j'. Let a k hi be the probability that 
Y?j=i Ukj = h for 0 < h < i and 1 < % < q. Then a km = 1 - p k ic and a kn = p kic . For 
0 < i < q and h = 0, 

Ofc0(i+1) = Ofe0i(l — Pk(i+l)C)- 

For h — i + 1, 

®fc(i+l)(t+l) ^kiiPk(i+l)C ■ 

For 1 < h < i, 

®kh(i+ 1) l)iPk(i+l)C T Pkhi)- 

Thus P(U k+ = k) = cikkq■ In the course of calculations, it is helpful to note that there 
is never a need for a k hi for h > k or for h < k — q + i. Minor changes in the algorithm 
are appropriate for k > q/2. In this case, p k ic is replaced by 1 — Phc in the algorithm, 
and P(U k+ = k ) is then a k ( g -k)q■ Alternative approaches to computations can also be 
considered that are comparable in terms of computational labor, although these methods 
do not involve the scaling procedures used here to prevent sums from becoming excessively 
small (Gustafsson, 1980; Liou, 1994). 

Given that the table of f k j has already been prepared and given that starting values 
based on joint estimation are employed, computation time for the Verbal test with q — 78 
was 10 seconds and for the Math test with q = 60 was 3 seconds on an IBM NetVista 
M41 computer with 512 megabytes of RAM running Windows NT. The Intel Pentium 4 
processor ran at 1.8 gigahertz. The stopping rule was that \/3 t+l — (3 t \ was no greater than 
0.00001. The starting values based on JMLE were reasonably successful. The maximum 
difference between joint and conditional maximum-likelihood estimates of a parameter /3j 
was 0.097 for the Verbal test and 0.126 for the Math test. These differences are sufficiently 
small to justify use of joint estimation for starting values for conditional estimation. At the 
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same time, the differences are rather large relative to the estimated asymptotic standard 
deviations of the parameters. For instance, (3jc has an estimated asymptotic standard 
deviation no greater than 0.008 in the Verbal and Math examples. 

It should be emphasized that the Newton-Raphson algorithm is relatively stable, so 
that a choice of crude starting values will not normally prevent convergence, although it 
may lead to somewhat slower computations. Because the computational cost of JMLE is 
very low, this approach to starting values appears appropriate. 

Estimates of H c and H v are readily obtained, together with estimated asymptotic 
standard deviations. For the Math test, Hcr = 0.41667, and Hur = 0.48040. The 
estimated asymptotic standard deviations of these statistics are quite small, about 0.00063. 
For the Verbal test, Hcr = 0.47497, and Hun = 0.52630. The estimated asymptotic 
standard deviations are about 0.00071. There is a substantial difference between the 
estimated expected log penalties for the Rasch model and for models in which the Rasch 
model holds and either ability parameters are constant or item difficulties are constant. For 
the Math test, Hui = 0.55715 and Hu a = 0.55492. For the Verbal test, Hui = 0.58440 and 
Hu A = 0.58355. 


2.3 Generalized Residuals 

In contrast to JMLE, generalized residuals are readily available for CMLE. In a very 
general formulation, for each examinee i, real weights d;(c) are assigned for g-dimensional 
vectors c with coordinates 0 or 1. L For Yj in r(/c), the estimated conditional expected 
value of di(Yi) given Y t+ = k is 

A = d i( c )pjc ( C )- 

cer(fc) 

The observed sum 

n 

0 = Yi) 

i =1 

is then compared to the estimated conditional expected value 

n 

e = E d ‘- 

i =1 
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Verification of regularity conditions is not trivial, but it is somewhat easier in the case in 
which q/n 1 / 2 approaches 0, a condition that appears appropriate for the Math and Verbal 
tests. 

Application of adjusted generalized residuals is relatively straightforward, although 
some computation is required to obtain the estimated asymptotic variance <r(e) of 
e = O — E. If Y i+ = k , then let 


let 


di = 22 di(c)pj C (c), 
cer(fc) 


u.i = 22 _ di] 2 pjc( c), 

cer(fc) 


let m ci be the q -dimensional vector with coordinates mkj{f3 c ) for j from 1 to q, and let 

g i= T. {c-m C i)di(c)pj C {c). 

cer(fc) 


Let 


Then 


h = ^gi- 


i —1 

n 

d 2 (e) = 22^ ~~ h T V^h. 

i= 1 


The adjusted residual 

z = e/<r(e) 

converges in distribution to a standard normal random variable if h T (V*)~h/a(e) is 


bounded above and if 


[er(e)l 1 max max max |cL(c)| 
l<i<nO<fc<gcer(fc) 


approaches 0, where u 2 (e) is defined by the equations 


dik = 22 di(c)p JC { c), 

cgr(fc) 

<? 

“• = E E [dp c) - d ik fpj( c), 

k= o cer(fc) 
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m kC is the g-dimensional vector with coordinates rn kj c = m k j((3) for j from 1 to q, 

<? __ 

g i = Yl 5Z ( c _ m kc)di{c)p JC ( c), 

k =o cer(k) 

n 

h = Yl 


2=1 


and 


a e = 




i —1 


A good example of an adjusted residual to consider is the sum 


°* = £*«!<+= !>/«• 


i =1 k =0 

The corresponding estimated expected value is 

q 


Ej ^ ^ /////,■////,■/(’- 


and the unadjusted residual is 


k=0 


e j Oj Ej ■ 


In effect, this sum leads to examination of the difference between the estimates of the 
point-biserial correlation of Y\ 3 and Y\ + derived with and without the Rasch model. Let 

n q 

Y. + = rT 1 ^ Y i+ = nT l ^ kn k , 

i= 1 k =0 

and let 

n q 

u = - W ) 2 = L"‘(* - y +) 2 - 

i=l fc=0 

The standard estimate of the point-biserial correlation is 

O, - Y + jY. + 


{Y +J (l-Y +J /n)U] llr 

Under the model, the estimated point-biserial correlation is 


Ej ~ Y +j Y. + 


[Y+A 1 - Y + j/n)U] 


1/2 ' 
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For any constant c, 


Oj-Ej^k-c)f kj c)n k m kjC , 


k =0 


k =0 


Let v k jj> be v k3] >(J3 c ). It is a straightforward matter to verify that 


a (ej) = 


^n k (k - Y. + ) 2 v kjj - hjY h,, 


k =0 


where coordinate j' of h, is 


y ^n k (k - Y. + )v kjy . 


k =0 


The adjusted residual for item j is then 


z 3 = ej/<j(ej). 

Both application to the Math test and application to the Verbal test provide 
overwhelming evidence that the Rasch model cannot hold. In the Math test, the largest 
adjusted residual in magnitude is z 57 = —158.08, and only 8 items are associated with an 
adjusted residual less than 10 in absolute value. In the Verbal test, z 57 is —144.54, and only 
10 adjusted residuals z 3 have magnitude less than 10 . 

The very large adjusted residuals are associated with substantial differences between 
observed and fitted point-biserial correlations. In the Math test, the observed point-biserial 
correlation for item 57 is 0.319. The fitted value is 0.482. In the Verbal test, the observed 
point-biserial correlation for item 57 is 0.234, while the fitted value is 0.401. 


2.4 Model Error and Log Penalties 

As with use of JMLE, modifications in results for CMLE must be made if the Rasch 
model does not hold. Arguments are quite similar in nature to those for JMLE, so details 
are omitted. As in Section 1.7, let the Y* be independent and identically distributed, and 
let the joint probability pj(c) that Y* = c be positive for each (/-dimensional vector c with 
coordinates Cj equal to 0 or 1. The conditional probability that Yi = c in T(k) given that 
Yi — k is then pjc(c ) = pj(c)/P k - A unique (/-dimensional vector [3 with coordinates (3j 
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exists such that (5\ = 0 and 


<? 

p k m kj{hc) = p] ■ 

k =0 

This definition of (3 is consistent with the previous definition of (3 if the Rasch model holds. 
Let 

Pjcr( c) = exp(-/3 T c)/s fc (c), c G T(k). 


Then the expected log penalty per response 

<? 

Hcr = -g" 1 Pk Pjc( c) \ogp JCR (c) 

k =o cer(fc) 


does not exceed the conditional expected log penalty per response 

q 

-»-£* £ Pjc{ c) logp' JC (c) 
k= o cer(fc) 


if 


Pjc ( c ) = ex P(-7 T c)/sfc(7), c G T(fc), 


for a (/-dimensional vector 7 with 71 = 0. The minimum expected log penalty per response 
Hcr is at least as great as the conditional entropy He of Yi given Yi + , with equality if, and 
only if, pjc( c) = Pjcr( c )- This equality surely holds if the Rasch model holds. Customary 
arguments for log-linear models show that 


PJR( C) = PkPJCRi c) 


satisfies the condition that the expected log penalty per response 

<? 

Hurt = H C r + H + = -q~ l ^ ^ p 3 ( c) logp JR (c) 

fc=o cer(fc) 


does not exceed 


for 


-r‘£ £ Pj(c) logp'j(c) 

fc=o cer(fc) 


Pj(c) = exp(a fc - 7 r c), c G T(fc), 
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for a (/-dimensional vector 7 with 71 = 0 and some real ay (Haberman, 1974, chap. 2) for 
which 

S ^( c ) = L 

k =0 cer(fc) 

Clearly H UR > Hu- The condition H UR = H v holds if the Rasch model holds. 

If the number q of items is fixed, then it is a straightforward matter to verify that 
(3 C converges almost surely to /3 and that n 1 // 2 (/3 c — (3) converges in distribution to a 
multivariate normal random variable with zero mean and with covariance matrix 

kv* _ v'v* _ k t . 

The matrix V' is the expected conditional covariance matrix of Y] given Y 1+ . This formula 
is consistent with the formula obtained if the Rasch model holds, for V' is then V*. 

If the number q of items increases, then, as in the case of JMLE, wording of results 
must be modified due to dependence of j3j on q. It is simplest to assume that the (3j are 
uniformly bounded as q increases, as is the case if the Rasch model is valid. Assume that 
positive constants a 1 and a 2 exist such that 

fj ~1 rri j rri 

C7X X < X V X < (U 2 X X 

for any q > 1 and any (/-dimensional vector x such that X!j=i x j = 0- Then \/3 — /3\ converges 
in probability to 0. If is row j and column j of of the matrix KV*“V'V* K T , 

then the standardized value — Pj C )/o’e(Pjc) converges in distribution to a standard 

normal deviate. It should be noted that results for an invalid model do approach those for 
a valid model as the rrikjc approach mkj((3) for some {3 such that /3i = 0. 

The minimum conditional expected penalty Hqr may be estimated by Hcr ■ For 
q fixed, Hcr converges to Hcr with probability 1 , and Hur converges to H R r with 
probability 1. For q increasing, Hcr — H C r and Hur — H UR converge in probability to 
0 . Normal approximations are readily available, at least if q/n approaches 0 . Let ct(Hur) 
be the standard deviation of q~ l logpj^Yx), and let cr(H C R ) be the standard deviation of 
q- 1 logp,jcR(Yi). For a fixed number q of items, n l ^ 2 {HuR ~ Hu) converges in distribution 
to a normal random variable with mean 0 and variance a 2 (Hur), and n 1//2 ( Hcr — He) 
converges in distribution to a normal random variable with mean 0 and variance ct 2 (Hcr)- 
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For q increasing, results are not quite as convenient as in the case in which the Rasch 
model holds. Provided that a(H C R ) and <j(Hur) are bounded below by a positive constant, 
n l ^ 2 {HuR — Hjj)/(j(Hur) and n l ^ 2 {HcR ~ Hc)/ct{Hcr ) both converge in distribution to 
a standard normal random variable. These results are readily applied to construction of 
approximate confidence intervals for He and Hu (Gilula & Haberman, 1995). Indeed, the 
same approach to approximate confidence intervals that applies if the Rasch model holds 
continues to apply even if the model is not valid. 

For a rather small number q of items, estimation of the entropies He and Hr may be 
accomplished without any assumptions concerning validity of the Rasch model. One may 
estimate He by 

q- 1 

H c = -{nqy'Yl N ( c ) ] °s[ N ( c )/ n k] 

k= i cer(fc) 


and Hu by 


q -1 

iV(c) log[iV(c)/n]. 

k= i cer(fc) 


For fixed q, He converges almost surely to He, and Hu converges almost surely to Hu- 
Thus a comparison with the estimates Hcr and Hur indicates the loss of predictive 
power due to use of the Rasch model. Approximate confidence intervals for the difference 
Hcr — He = Hur — Hu are readily available (Gilula & Haberman, 1994; Gilula & 
Haberman, 1995). In addition, formal chi-square tests for validity of the Rasch model are 
available in this case. The standard likelihood-ratio chi-square statistic 


L 2 — 2nq(Hu — Hur ) — 2 nq(H C R — He) 


converges in distribution to a chi-square random variable with 2 q — 2q degrees of freedom. 
If H v > H V r, so that the Rasch model fails, then L 2 jn converges almost surely to 
2 [Hu — H V r) > 0 . 

Application of the chi-square test is limited to relatively small subtests of the Math and 
Verbal tests. Presumably npj(c) should be at least 1 for c in r(fc) for 1 < k < q — 1 and 80 
percent of such npj( c) should be at least 5 (Cochran, 1954). In practice, the sample size n 
should be relatively large compared to 2 q . Nonetheless, it should be noted that the Rasch 
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model for a complete Math or Verbal test can only hold if it holds for a subtest consisting 
of a selected group of items. For the first five questions in the Verbal test, Hur is 0.411 and 
Hu is 0.410, so that the actual information loss appears small. Nonetheless, overwhelming 
evidence exists that the Rasch model cannot hold, for L 2 = 4,175.5, and there are only 
2 5 — 2(5) = 22 degrees of freedom. This pattern is repeated for other selections of items. 
For the first five Math items, Hurt — 0.344, Hu = 0.343, and L 2 = 4, 580.5. For the last five 
Verbal items, Hun = 0.597, Hu = 0.593, and L 2 = 12,257.7. For the last five Math items, 
H ur = 0.604, H v = 0.602, and L 2 = 7, 239.4. 

An alternative approach to studying lack of fit can be based on a simple variant of the 
Rasch model in which it is assumed that, for some (/-dimensional vectors (3 and 7 with 
initial coordinates 0 , 

_ exp{—[/3 + (k - q/ 2 ) 7 ] t c} 

PJC[C) Sk {(3 + q-'{k-q/ 2 ) 7 ) 

for c in T (k). This variant includes the Rasch model as a special case, as is evident by 
consideration of 7 equal to 0. This model may be analyzed through the same conditional 
arguments used in the Rasch model. The resulting estimates /3 CL , 'Ycli and 

rh kj cL = m kj ({3 + q -1 (k - q/ 2 ) 7 ) 


satisfy the constraints 

Q 

’ S ^ j nk‘fhkjCL = Y + j, 
k =0 


and 


y^Jk - q/2)n k rhkjCL = ^2^- q/2)f kj , 

k =0 k =0 

q 

y2 ™kjCL = k. 

3 =1 


A notable feature for the new model is that fitted and observed point-biserial correlations 
of Yij and V:+ are the same. Given that the ability variable 6\ has positive variance, 
consistency and asymptotic normality results are readily established both for the case in 
which the model holds and for the case in which the model fails, and the Newton-Raphson 
algorithm remains applicable. Under the new model, the minimum expected logarithmic 
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penalty per observation for prediction of Yi by Y 1+ is 


H CL = -q 1 ^Pk Pjc (c) log Pjcl(c), 

k= o cer(fc) 


where Pjcl( c) is defined so that 


and 


Pjcl = 


exp {-\(3 L + q \k -q/2)~/ L ] T c} 


Sk(P L + q- 1 (k-q/2)j L ) 

m kjC L = m kj {f3 L + (k- q/2)j L ), 
<? 

^2 P k m kjCL = P) , 


k =0 


y^jk - q/2)P k rn kj cL = ^(k - q/2)P k m kjC , 


k =0 


k =0 


y2 m kjCL = k. 

3 =1 

The corresponding minimum expected log penalty for prediction of Yi is 

H ul = H C l + H+. 


If 


Pjcl(c ) = 


exp {-[PcL + q (k-q/ 2)j cl } t c} 


s k {(3 + q~\k - q/ 2 ) 7 ) 
for c in T(A;), then one obtains the estimated expected log penalties 


Hcl = ~(nq ) 1 ^2 log Pjcl (Y*) 

i=l 


and 


Hul = Hcl + H+. 


For the Verbal test, Hul = 0.52098 is rather close to H UR = 0.52630. For the Math 
test, Hul = 0.47500 is rather close to Hur — 0.48040. Despite the closeness, there is 
overwhelming evidence based on these comparisons that the Rasch model cannot hold. 
To verify this claim, consider the likelihood-ratio chi-square statistic 


L\ — 2 nq(HuR — Hul ) — 2 nq(HcR — Hcl)- 
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If the Rasch model holds and q is fixed, then a straightforward application of general 
results for log-linear models permits a demonstration that L\ converges in distribution to 
a chi-square random variable with q — 1 degrees of freedom (Haberman, 1974, chap. 4). 

A more complicated case has q increasing but q 2 /n approaching 0. In this case, one may 
show that (L\ — q + l)/[2(g — l)] 1 / 2 converges in distribution to a standard normal random 
variable (Haberman, 1977b; Haberman, 1977a; Portney, 1988). The normal approximation 
supports use of the chi-square approximation, for a chi-square random variable y 2 with v 
degrees of freedom satisfies the condition that (y 2 — v)/(2v) l P converges in distribution to 
a standard normal random variable as v approaches oo. 

For the case under study, the Math test yields L\ = 289,401 on 59 degrees of freedom, 
and the Verbal test yields L\ of 370,648 on 77 degrees of freedom, so that the test statistics 
provide overwhelming evidence that the Rasch model does not hold. Note that it has been 
shown that the Rasch model is not valid for either test; however, no demonstration has been 
made that the model error is very large in terms of prediction of the response vector Y,. 

Other comparable likelihood-ratio chi-square tests can and have been constructed 
(Andersen, 1973b). The particular test chosen has been emphasized because the number of 
degrees of freedom is relatively small and regularity conditions appear reasonable for the 
sample sizes and numbers of items under consideration. 

3 Latent Structures and Log-linear Models 

There is a subtle difficulty encountered in Section 2 in distinguishing between a Rasch 
model and a log-linear model. For a Rasch model, 

Pj( c) = Pk exp(—/3 / c)/s?j(/3) (18) 

for some (/-dimensional vector (3 with /3\ — 0 and some distribution function D for which 

/ OO 

e fce T(A 0)dD(0). (19) 

-oo 

In the corresponding log-linear model, 

Pj( c) = exp (ay - c), c G T(k), 
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for a (/-dimensional vector /3 with / 3 \ — 0 and some real for which 


q q 
J^exp (a k )s k (f3) = ^ Pj(c) = 1 


fc =0 


fc=o cer(fc) 


(20) 


(Tjur, 1982). If (18) holds, then the log-linear model holds with 

/ OO 

e ke ^((3,9)dD{9). (21) 

-oo 

Thus the Rasch model implies the log-linear model. If Z is a positive random variable such 
that the distribution function of log Z has value 

1^(13,0)dD(9) 

IZ *(». 0)dD(e) 

for x in R, then exp(cKfc — cko) = E(Z k ). 

If the log-linear model holds and a positive random variable Z exists such that 
E(Z k ) = exp(ttfc — a 0 ) for 0 < k < q, then the Rasch model holds (Cressie & Holland, 
1983), for one may let A be the distribution function of logZ and let D be the distribution 
function such that 

[ 1 

for real x. In this case, the equation 

fc =0 


and (20) imply that 

/ OO 

[T(/3, e)]^ 1 dA(6). 

-oo 

It then follows that (21) holds, so that (19) holds. 

Classical results concerning existence of moments of positive random variables may be 
used to indicate whether a particular choice of 0 < k < q, corresponds to a suitable 
positive random variable Z such that exp(afc — a 0 ) = E(Z k ) for 1 < k < q. It essentially 
suffices to consider whether two matrices are positive definite or nonnegative definite. If q 
is even, let s = q/2 and r = s + 1. If q is odd, let r = s = (q + l)/2. Let the log-linear 
model hold. Let A be the r by r matrix with row i and column j equal to exp(a i+ j^ 2 — a 0 ), 
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and let B be the s by s matrix with row i and column j equal to exp(a i+J _ 1 — a 0 ). The 
Rasch model can only hold if A and B are nonnegative definite, and the Rasch model can 
only hold for a continuous distribution function D of 9\ if A and B are positive definite. 
On the other hand, if A and B are positive definite, then the Rasch model holds for some 
distribution function D (Cressie & Holland, 1983). In addition, the observed Pk and (3 are 
consistent with a distribution function D corresponding to a random variable with mass 
confined to no more than s points (Karlin & Studden, 1966, pp. 44, 173). The observed Pk 
are also consistent with other distribution functions D (Lindsay et al., 1991). 

These results lead to a relatively simple relationship between unconstrained maximum- 
likelihood estimates and conditional maximum-likelihood estimates. If the conditional 
maximum-likelihood estimate (3 C exists, if each rq ; is positive, if Pk = nk/n, if 

oik = log [P k /s k 0c)], 


and if 


exp(a t - do) = E(Z k ), 0 < k<q, 


( 22 ) 


for some positive random variable Z , then i M = £qm + ?sm, and (3 C is the unique marginal 
maximum-likelihood estimate /3j of (3. If Dj is a distribution function such that 


h = s k 0 c ) / e fce T0 3 c ,6)dD(6), 


then Dj is a marginal likelihood estimate of D. The estimate Dj is not uniquely defined. 
If no positive random variable Z exists such that (22) holds, then fie need not be an 
unconstrained marginal maximum-likelihood estimate of /3. 

As a practical matter, the difference between a conditional maximum-likelihood 
estimate of (3 and an unconstrained marginal maximum-likelihood estimate of [3 appears 
to be small. If the Rasch model holds, the number q of items is fixed, and A and B 
are positive definite, then standard continuity properties of eigenvalues imply that the 
probability approaches 1 that (3 C is the unique unconstrained marginal maximum-likelihood 
estimate of (3 (Wilkinson, 1965, chap. 2). This situation can also apply if nj2 q becomes 
large; however, this condition is clearly not relevant for the SAT I data under study. More 
typically, if q is large, then some of the products nPk are quite small, even if n is very 
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large. Thus it is possible that the probability does not approach 1 that (3 C is the unique 
unconstrained marginal maximum-likelihood estimate of (3. This result does not prevent 
effective estimation of (3 or D. The large-sample properties of the (3 C are basically the 
same properties obtained if maximum-likelihood is applied in the case in which the 6 t are 
known. It remains the case that the empirical distribution function D does approximate 
the distribution function D. 

There does exist a possibility that the log-linear model holds but the Rasch model does 
not, and tests of goodness of fit, which are really based on the log-linear model, will not 
detect this situation. 


4 Conclusions 

The results derived in the preceding sections suggest that CMLE provides an effective 
approach for analysis of the Rasch model for dichotomous items even in cases in which both 
the sample size and the number of items are large. An efficient approach for computation 
of conditional maximum-likelihood estimates has been derived that is considerably faster 
than previous algorithms. Standard large-sample approximations for the distributions of 
conditional maximum-likelihood estimates have been shown to apply, so that asymptotic 
confidence intervals are available. In addition, methods for residual analysis have been 
developed based on CMLE, and formal tests of goodness of fit have been produced that 
have effectively demonstrated lack of fit for the SAT I data under study. 

Efforts have also been made to apply JMLE under realistic conditions. Results 
have been somewhat less satisfactory. Significant problems of asymptotic bias of joint 
maximum-likelihood estimates were detected for the SAT I data, generalized residual 
analysis was much more limited, goodness of fit was much less readily tested, and the 
analyses based on expected logarithmic penalty were more affected by bias problems. 
Nonetheless, JMLE was found to provide an effective approach for calculation of starting 
values for CMLE, and it was adequate for study of individual deviations from the Rasch 
models. 

The analysis has also considered evaluation of the value of the Rasch model in terms 
of effective prediction of the pattern of item responses. This analysis has been based on 
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the criterion of expected logarithmic penalty. It has suggested but not conclusively proven 
that, even though the Rasch model is not valid for the SAT I data under study, the error 
of the Rasch model is relatively small. Further work on alternative models is needed to 
consider this issue more thoroughly; however, the basic issue is that small errors in models 
are readily detected if sample sizes are sufficiently large. 

The methods of analysis developed in this report can be applied more generally. 

The most immediate generalizations involve the Rasch model for polytomous responses. 
The JMLE approach has application to 2PL and 3PL models; although presumably the 
limitations observed for the Rasch model will be even more severe in these cases. 

The alternative to the Rasch model used in testing goodness of fit is not a latent- 
structure model; however, it is easily implemented and has potential use by itself. Of 
particular interest is the question of how this model compares to common psychometric 
models in terms of predictive power. Such comparisons will require analysis of marginal 
models for the case of large sample sizes and large numbers of items. Obviously much work 
remains. 
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Appendix 
Proofs of Results 


Proof that v rj >k{f3) <0 if j f f and 0 < k < q. 

For any random variables A and B with values 0 and 1, if P(A = a, B = b) = r a b, then 
the covariance of A and B is 


Tn(Tii + Tio + Toi + Too) - (th + Tio)(Tn + Toi) 

= r llT00 — ToiTio. 

For k — 1, it is trivially true that F/y fc (/3) < 0 for j j', for Y t y = Y VJ = 1 cannot hold if 
Y i+ = 1. For k > 1, consider j and j 1 such that 1 < j < j’ < q. Then the probability that 
Yjj = a and Y t ,y = b given that Y i+ = k is 

E ce r' (a, b,j,f,k) e M-P'c) 


h, 


abjj'k 


(/ 3 ) = 


Ecerw^Pl-/ 3 ^) 

where T k ) consists of c in T(k) such that Cj = a and c 3 < = b. Thus the conditional 
covariance of Yy and Y t y given Y i+ = k, 2 < k < q is hujj’khoojj'k — h{)\ rj 'kh\Q 3 yk- Let 

k) be the population of (/-dimensional vectors d with nonnegative integer coordinates 
dh < 2 such that Ylh=i ( bi = 2 k and dj = dy = 1. If c is in r'(l, 1 k ) and e is in 
r'(0, 0 ,j,j',k), then c + e is in G(j,j',k). Similarly, if c is in r'(l, Q,j, j', k) and e is in 
r(0,1 k ), then c + e is in k ). For each d in G(j,jk ), let u(d) be half the 

number of coordinates dh = 1 for 1 < h < q, h ^ j, and h j'. Note that u(d) must be an 
integer. To any d in G(j,j', k) correspond 

] mm 2 

pairs of c in r'(l, 0 ,j,f, k ) and e in r'(0,1 k ) such that c + e is in G(j,j', k ) and 

c = [ 2i; ( d )] ! 

J [u(d) + l]![u(d)-l]! 

pairs of c in r'(l, 1 ,j,f, k) and e in r'(0, 0 k ) such that c + e is in G(j,j', k ). Thus 

EdeG(i,i',fc)fe(d) - 6 (d)] exp(/3'd) 


Vjj'kiP) = 


Ecer(fc) exp(/3 7 c) 
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The ratio 


6(d) 

6(d) 


so that Vjjik((3) < 0. 



